Hi Monika,

Thanks for your tip! 
However, running your command on our assetstore folder, it lists any 
"small" file, not only the PDF ones. Here it found 7281 files...
As I do not use ruby here, helix approach, using a SQL query, was 
satisfactory.
Thanks!

Michelangelo

Em quarta-feira, 30 de março de 2016 12:50:07 UTC-3, momeven escreveu:
>
> Michelângelo
>
> If you are on a unix system you can use the find command to list files in 
> your assetstore that are small 
>
> cd /dspace/assetstore 
> find . -type f -and -size -900c -exec basename {} \;
>
> the names that are printed correspond to the internal ids of bitstreams 
>
> given those internal ids you can go to the database and try to get to the 
> information with SQL queries. personally I dislike that approach. For that 
> reason I developed a small ruby gem that interacts with the DSPACE API 
> classes so that I can script these maintenance activities 
>
> using the gem I put a script together that reads internal ids from a file 
> or stdin and then prints information about the corresponding bitstreams; a 
> line for each internal id: 
>
> 96563514287524427390952035236210734474  99343   ITEM.80161  
> 99999/fk4vq36h8m    COLLECTION.87   88435/dsp01x633f104k    COMMUNITY.67    
> 88435/dsp01td96k251d
>
> Have a look at 
> The gem  https://github.com/akinom/dspace-jruby
> The CLI scripts that use the gem: https://github.com/akinom/dspace-cli
> The script that generates the bitstream report (tsp) 
> dspace-cli/*/statistics/bitstreams_from_internalids.rb 
> <https://github.com/akinom/dspace-cli/blob/master/statistics/bitstreams_from_internalids.rb>
>
> you can run the script (once you have jruby installed) as follows 
>
> cd dspace-cli
> DSPACE_HOME  = dspace install dir  if different from /dspace 
> bundle exec statistics/bitstreams_from_internalids.rb  
> file_with_bitstream_internal_ids
>
>
> I hope this helps 
>
> Monika
>
> ----
> Monika Mevenkamp
> Digital Repository Infrastructure Developer
> Princeton University
> Phone: 609-258-4161
> Skype: mo-meven
>
>
>
> On Mar 30, 2016, at 8:59 AM, Michelangelo Viana <[email protected] 
> <javascript:>> wrote:
>
> Hi fellows,
>
> We are on DSpace 5.1 (http://repositorio.pucrs.br/dspace) and need to 
> know how to list corrupted PDFs (empty files) on DSpace server.
> As I can see, Curator Tasks does not have this kind of report, and 
> checksum report, even lists this kind of error, is too confused to me to 
> understand it.
>
> Whe automatically load some items (metadata+PDF) into DSpace using the 
> *import* command: metadata is taken from ALEPH500 system and PDF from 
> another system (TEDE2). However, when PDF cannot be found during the 
> process (ie, TEDE2 server is offline), the record is created (imported) in 
> DSpace with an "empty" PDF. On the UI, the PDF file is 569 B (569 bytes) 
> only and we only know about this error by the users when they report it to 
> us... 
>
> - Can someone give me a clue to list this kind of "empty" PDF files?
>
> Thanks in advance,
>
> Michelângelo Viana
> PUCRS/Main Library/Brazil
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to