Ivan,
Thanks for checking in...
dspace filter-media returns with exit status 0. The dspace log shows no
errors, just entries of the form:
2013-09-23 10:37:41,012 INFO org.dspace.search.DSIndexer @ Writing
Community: 2408/104859 to Index
or:
2013-09-23 10:37:40,336 INFO org.dspace.search.DSIndexer @ Writing
Collection: 2408/55874 to Index
The output from the command line is short. Normally, I would expect to see
a log of each bitstream examined beginning with 'FILTERED' or 'SKIPPED'.
Instead I see only a few errors for .doc files (Invalid Format) followed
by a couple of SKIPPED entries for bitstreams with an existing .txt file.
All the .pdf files are in the ORIGINAL bundle. For instance:
dspace=> select * from item2bundle where item_id = 34950;
-[ RECORD 1 ]----
id | 39982
item_id | 34950
bundle_id | 39983
-[ RECORD 2 ]----
id | 39983
item_id | 34950
bundle_id | 39984
dspace=> select * from bundle where bundle_id in ( 39983, 39984 );
-[ RECORD 1 ]--------+---------
bundle_id | 39983
name | LICENSE
primary_bitstream_id |
-[ RECORD 2 ]--------+---------
bundle_id | 39984
name | ORIGINAL
primary_bitstream_id |
dspace=> select * from bundle2bitstream where bundle_id = 39984;
-[ RECORD 1 ]---+------
id | 40042
bundle_id | 39984
bitstream_id | 40065
bitstream_order | 2
dspace=> select * from bitstream where bitstream_id = 40065;
-[ RECORD 1 ]-----------+------------------------------------------------
bitstream_id | 40065
bitstream_format_id | 3
name | 8175706.pdf
size_bytes | 6587102
checksum | 164de17195af1d0de45cd17a431fc2b9
checksum_algorithm | MD5
description |
user_format_description |
source | /dspace/assetstore/dspace-sr/upload/8175706.pdf
internal_id | 104968051252620967298398595849898250327
deleted | f
store_number | 0
sequence_id | 2
This bitstream however is neither FILTERED nor SKIPPED.
This database has been recently updated from v1.42 to v3, and I suspect the
problem is somewhere in the db rather than a bug in the code, but
everything *looks* right to me. I can trace the relations from the
community to collection to item, but for some reason the bitstreams are
simply not checked.
What do you think?
Bill
On Sun, Sep 22, 2013 at 12:35 PM, helix84 <[email protected]> wrote:
> Hi Bill, please remember to keep dspace-tech in CC.
>
> Can you please tell me what the result of each of my suggestion was?
> 1) What was the errorlevel of your filter-media command?
> 2) Did you look at the log while it was running using "tail -f"?
> 3) Were all the bitstreams you expected to be filtered in the ORIGINAL
> bundle? (check at least a few)
>
>
> On Fri, Sep 20, 2013 at 10:09 PM, Bill Tantzen <[email protected]> wrote:
> > Hi Ivan!
> >
> > I've tried all these suggestions, and still, no success.
> >
> > There are no errors in the log, only entries of the form:
> >
> > 2013-09-20 15:00:24,802 INFO org.dspace.search.DSIndexer @ Writing
> > Community: 2408/36293 to Index
> >
> > And
> >
> > 2013-09-20 15:00:17,990 INFO org.dspace.search.DSIndexer @ Writing
> > Collection: 2408/35292 to Index
> >
> > One for each community and collection. The bundles are ORIGINAL, nothing
> > special here...
> >
> > The database seems OK, I am able to follow the communities to
> collections to
> > items just fine, but no bitstreams are being filtered.
> >
> > I'll keep debugging on my end, but if you have any other ideas, do pass
> them
> > my way!
> > Bill
> >
> >
> > On Thu, Sep 19, 2013 at 9:08 AM, helix84 <[email protected]> wrote:
> >>
> >> Hi Bill,
> >>
> >> Jose's suggestion to look at the logs for errors is a good one. First
> >> of all, we should determine whether the filtering failed during
> >> processing some item or whether it completed with nothing else to
> >> process.
> >>
> >> Also check the errorlevel of the command. 1 means error, 0 means
> success.
> >>
> >>
> >> On Thu, Sep 19, 2013 at 3:03 PM, Bill Tantzen <[email protected]>
> wrote:
> >> > Still working on this media filter issue -- maybe this might point me
> in
> >> > the
> >> > right direction: how are bitstreams selected for filtering? Is it
> >> > something like SELECT * FROM bitstream WHERE ???
> >> > What is in the WHERE clause? Or is there some other basis for
> >> > selection?
> >>
> >> No, it's not SQL. It's a recursive call down the hierarchy, as you can
> >> see in this method and the few following it: [1]
> >>
> >> However your WHERE suggestion got me thinking which bitstreams are
> >> being processed and the answer is bitstreams in the ORIGINAL bundle.
> >> So please check that your content bundles are called ORIGINAL and not
> >> something else (e.g. THUMBNAIL or something custom).
> >>
> >> [1]
> >>
> https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L393
> >> [2]
> >>
> https://github.com/DSpace/DSpace/blob/dspace-3.2/dspace-api/src/main/java/org/dspace/app/mediafilter/MediaFilterManager.java#L502
> >>
> >> Regards,
> >> ~~helix84
> >>
> >> Compulsory reading: DSpace Mailing List Etiquette
> >> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
> >
> >
>
>
>
> Regards,
> ~~helix84
>
> Compulsory reading: DSpace Mailing List Etiquette
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette