Hi all, I modified Matteo' script to get it work without "specific" configuration.I just add consideration after the code, so maybe you will take a look at the source :P
#=======================================================================# # Configuration # #=======================================================================# # Object root directory, as defined in fedora.fcfg #objroot='/path/to/fedora/object/root/directory' objroot='/var/fedora/data/objectStore' # DataStream root directory DSroot='/var/fedora/data/datastreamStore' # The awk print string matches the directory pattern defined in # akubra-llstore.xml# ORIG awkcmd='{ print substr($0,1,1) "/" substr($0,2,2) "/" substr($0,4,2); }'
awkcmd='{ print substr($0,1,2)}' #=======================================================================# # Main # #=======================================================================# if [ "X$1" == "X" ] then echo "Usage: $0 <fedora_object_pid>" 1>&2 exit 1 fi pid="$1" path=`echo -n "info:fedora/$pid" | md5sum | awk "$awkcmd"`normalizedPID=`echo -n "info:fedora/$pid" | sed 's/:/%3A/g' | sed 's#/#%2F#g'`
if [ -f $objroot/$path/$normalizedPID ] then ## getting Datastream IDs!DatastreamIDs=`cat $objroot/$path/$normalizedPID | grep -w 'foxml:datastream' | awk '{print $2}' | cut -f2 -d '"' | sed '/^$/d'`
else echo "No DS found..." exit 1 fi for i in $DatastreamIDs do # getting versions from foxmlDSversions=`cat $objroot/$path/$normalizedPID | grep 'foxml:datastreamVersion' | awk {'print $2'} | cut -f2 -d '"' | sed '/^$/d' | grep $i`
echo -n "foxml:datastream --> " echo $i echo -n "foxml:datastreamVersion --> " echo $DSversions for j in $DSversions do pattern="/$i/$j"DSpath=`echo -n "info:fedora/$pid${pattern}" | md5sum | awk "$awkcmd"` fmtpidDS=`echo -n "info:fedora/$pid${pattern}" | sed 's/:/%3A/g' | sed 's#/#%2F#g'`
if [ -f $DSroot/$DSpath/$fmtpidDS ] then echo " Path: $DSroot/$DSpath/$fmtpidDS" ls -lh "$DSroot/$DSpath/$fmtpidDS" | awk '{print " Size: "$5}' echo "" else echo " Not on filesystem..." echo "" fi done done exit ################################ This is an output example: ~# ./test.sh uniposmi:6666 foxml:datastream --> AUDIT foxml:datastreamVersion --> AUDIT.0 Not on filesystem... foxml:datastream --> DC foxml:datastreamVersion --> DC1.0 Not on filesystem... foxml:datastream --> PDF foxml:datastreamVersion --> PDF.0Path: /var/fedora/data/datastreamStore/03/info%3Afedora%2Funiposmi%3A6666%2FPDF%2FPDF.0
Size: 67K foxml:datastream --> P7M foxml:datastreamVersion --> P7M.0Path: /var/fedora/data/datastreamStore/55/info%3Afedora%2Funiposmi%3A6666%2FP7M%2FP7M.0
Size: 69K foxml:datastream --> XML foxml:datastreamVersion --> XML.0Path: /var/fedora/data/datastreamStore/d8/info%3Afedora%2Funiposmi%3A6666%2FXML%2FXML.0
Size: 276This works good if datastream_ID is "contained" in datastreamVersion_ID... e.g.
<foxml:datastream ID="P7M" STATE="A" CONTROL_GROUP="M" VERSIONABLE="false"> <foxml:datastreamVersion ID="P7M.0" [...]Unfortunately I stuck in the code to get it work without this "trick", for this reason any kind of ideas/improvements/etc are welcome!! :)
Hope this helps too, bye! R On 06/06/2012 10:15, Matteo Boschini wrote:
Here's Scott's original script, with a few (naive) modifications to get the DS path, given that you know the DS CM. I'm trying to find a fwe spare hours (lol), in oprder to get the DS directly from CM instead of hardcoding it in the script Hoep this helps ######################## #!/bin/bash #=======================================================================# # Configuration # #=======================================================================# # Object root directory, as defined in fedora.fcfg #objroot='/path/to/fedora/object/root/directory' objroot='/var/fedora/data/objectStore' # DataStream root directory DSroot='/var/fedora/data/datastreamStore' # DataStream "pattern". Depends on ContentModel. # Hard-coded as for now... DSCMpattern='/PDF/PDF.0' # The awk print string matches the directory pattern defined in # akubra-llstore.xml # ORIG awkcmd='{ print substr($0,1,1) "/" substr($0,2,2) "/" substr($0,4,2); }' awkcmd='{ print substr($0,1,2)}' #=======================================================================# # Main # #=======================================================================# if [ "X$1" == "X" ] then echo "Usage: $0<fedora_object_pid>" 1>&2 exit 1 fi pid="$1" path=`echo -n "info:fedora/$pid" | md5sum | awk "$awkcmd"` DSpath=`echo -n "info:fedora/$pid${DSCMpattern}" | md5sum | awk "$awkcmd"` echo "$DSpath" fmtpid=`echo -n "info:fedora/$pid" | sed 's/:/%3A/g' | sed 's#/#%2F#g'` fmtpidDS=`echo -n "info:fedora/$pid${DSCMpattern}" | sed 's/:/%3A/g' | sed 's#/#%2F#g'` echo "DBG $fmtpidDS" ls -l "$objroot/$path/$fmtpid" ls -l "$DSroot/$DSpath/$fmtpidDS" exit ################################ On Wed, Jun 6, 2012 at 12:53 AM, Scott Prater<pra...@wisc.edu> wrote:Here you go, Vincent: http://sourceforge.net/mailarchive/message.php?msg_id=29350617 As Matteo noted, this is to find just objects, not datastreams. He tweaked it to return the path to the datastream: echo -n "info:fedora/yournamespace:somePID/SOMEDATASTREAM/SOMEDATASTREAM.0" | md5sum I agree this is a hack, but if you have access to the filesystem where the datastreams are stored, it is a very fast solution. -- Scott On 06/05/2012 07:53 AM, Nguyen, Vincent (CDC/OD/OADS) (CTR) wrote:Benjamin, The script does have access to the Fedora managed content. I thought about doing it this way as well, it would work but it does seem like a bit of a "hack". Do you happen to have a link to the script that Scott Praeter posted to get this info? Thanks everyone for your reply! Vincent Vu Nguyen -----Original Message----- From: Benjamin Armintor [mailto:armin...@gmail.com] Sent: Monday, June 04, 2012 5:02 PM To: Support and info exchange list for Fedora users. Subject: Re: [fcrepo-user] File size of content/datastream Where is the script that needs this information running? Does it have access to the device that Fedora is storing managed content on? I think Scott Praeter posted a script to this list a couple of days ago that formulated a file path to the managed data using the same algorithm Fedora does internally, and something like that might be your best option until you can upgrade to 3.4+. - Ben On Mon, Jun 4, 2012 at 4:57 PM, James, Eric<eric.ja...@yale.edu> wrote:Vincent, You could get the size programmatically with a java method such as below. I commented out the urlConn.getContentLength() as, as you found out that returns 0., But the byte counter loop works: public int getEADSize(String hostname,String pid) throws Exception { String urlString = hostname + "/fedora/get/"+pid+"/EAD"; URL url = new URL(urlString); URLConnection urlConn = url.openConnection(); //int len = urlConn.getContentLength(); InputStream is = urlConn.getInputStream(); int len = 0; while (is.read()!=-1) { len++; } return len; } -Eric ________________________________________ From: Nguyen, Vincent (CDC/OD/OADS) (CTR) [v...@cdc.gov] Sent: Monday, June 04, 2012 4:03 PM To: Support and info exchange list for Fedora users. Subject: Re: [fcrepo-user] File size of content/datastream Thanks for the Replies. Rebecca, the getSize() returns '0' for Managed Datastreams like the ticket says. We plan on upgrading but for now, we're locked in with version 3.2 for at least 6 months. Kyle, unfortunately the HTTP HEAD request does the same thing. It doesn't return "Content-Length". When I print getHeaderFieldKey and getHeaderField, this is what I get (DS1 is the managed datastream I want to grab the info for). DC - 417MODS - 3489THUMBNAIL_LARGE - 0THUMBNAIL_SMALL - 0RELS-EXT - 385DS1 - 0Server HTTP version, Response code: HTTP/1.1 200 OK Server=Apache-Coyote/1.1 Set-Cookie=JSESSIONID=A7F1672733ECCB504E5C8ACD4A556C91; Path=/muradora Content-Disposition=attachment; filename=demo_12679DS1.pdf Content-Type=application/pdf Transfer-Encoding=chunked Date=Mon, 04 Jun 2012 19:58:30 GMT The only work around I can think of is to index the 'filesize' value during Ingest. Which means we'll have to reindex every object. Vincent Vu Nguyen -----Original Message----- From: Rebecca Sutton Koeser [mailto:rebecca.s.koe...@emory.edu] Sent: Monday, June 04, 2012 2:47 PM To: Support and info exchange list for Fedora users. Subject: Re: [fcrepo-user] File size of content/datastream On Monday June 04, 2012 at 06:35 PM, Nguyen, Vincent (CDC/OD/OADS) (CTR) wrote:Is there a way to get the Filesize of a Managed Datastream without having to actually download the file? We're on Fedora 3.2.The datastream information provided by API-M getDatastream should include the size of the datastream. However, there have been issues with datastream size in certain versions of Fedora, and I think managed datastreams in 3.2 might be problematic. You can probably confirm by checking on some of your content. The JIRA ticket I'm thinking of indicates this was fixed in Fedora 3.4: https://jira.duraspace.org/browse/FCREPO-64 -- Rebecca Sutton Koeser, Ph.D. rebecca.s.koe...@emory.edu Digital Programs& Systems - Woodruff Library, Emory University ---------------------------------------------------------------------- -------- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ---------------------------------------------------------------------- -------- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users-- Scott Prater Shared Development Group General Library System University of Wisconsin - Madison pra...@wisc.edu 5-5415 ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
-- Dr. Riccardo Valzorio Servizi Sistemistici, sicurezza e reti mail: valzo...@cilea.it - skype: riccardo.valzorio Ph: +39 02 26995.384 - mob. +39 348 1328436 - fax +39 02 2135520 CILEA - Consorzio Interuniversitario http://www.cilea.it/disclaimer"A computer is like air conditioning: it becomes useless when you open windows." L. Torvalds
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users