Hi all,

I modified Matteo' script to get it work without "specific" configuration.
I just add consideration after the code, so maybe you will take a look at the source :P

#=======================================================================#
# Configuration                                                         #
#=======================================================================#

# Object root directory, as defined in fedora.fcfg
#objroot='/path/to/fedora/object/root/directory'
objroot='/var/fedora/data/objectStore'
# DataStream root directory
DSroot='/var/fedora/data/datastreamStore'

# The awk print string matches the directory pattern defined in
# akubra-llstore.xml
# ORIG awkcmd='{ print substr($0,1,1) "/" substr($0,2,2) "/" substr($0,4,2); }'
awkcmd='{ print substr($0,1,2)}'

#=======================================================================#
# Main                                                                  #
#=======================================================================#

if [ "X$1" == "X" ]
then
   echo "Usage:  $0 <fedora_object_pid>" 1>&2
   exit 1
fi

pid="$1"

path=`echo -n "info:fedora/$pid" | md5sum | awk "$awkcmd"`
normalizedPID=`echo -n "info:fedora/$pid" | sed 's/:/%3A/g' | sed 's#/#%2F#g'`
if [ -f $objroot/$path/$normalizedPID ]
then
  ## getting Datastream IDs!
DatastreamIDs=`cat $objroot/$path/$normalizedPID | grep -w 'foxml:datastream' | awk '{print $2}' | cut -f2 -d '"' | sed '/^$/d'`
else
  echo "No DS found..."
exit 1
fi

for i in $DatastreamIDs
do
  # getting versions from foxml
DSversions=`cat $objroot/$path/$normalizedPID | grep 'foxml:datastreamVersion' | awk {'print $2'} | cut -f2 -d '"' | sed '/^$/d' | grep $i`
  echo -n "foxml:datastream --> "
  echo $i
  echo -n "foxml:datastreamVersion --> "
  echo $DSversions
  for j in $DSversions
    do
    pattern="/$i/$j"
DSpath=`echo -n "info:fedora/$pid${pattern}" | md5sum | awk "$awkcmd"` fmtpidDS=`echo -n "info:fedora/$pid${pattern}" | sed 's/:/%3A/g' | sed 's#/#%2F#g'`
    if [ -f $DSroot/$DSpath/$fmtpidDS ]
    then
      echo "  Path: $DSroot/$DSpath/$fmtpidDS"
      ls -lh "$DSroot/$DSpath/$fmtpidDS" | awk '{print "  Size: "$5}'
      echo ""
    else
      echo "  Not on filesystem..."
      echo ""
    fi
  done
done
exit
################################

This is an output example:

~# ./test.sh uniposmi:6666
foxml:datastream --> AUDIT
foxml:datastreamVersion --> AUDIT.0
  Not on filesystem...

foxml:datastream --> DC
foxml:datastreamVersion --> DC1.0
  Not on filesystem...

foxml:datastream --> PDF
foxml:datastreamVersion --> PDF.0
Path: /var/fedora/data/datastreamStore/03/info%3Afedora%2Funiposmi%3A6666%2FPDF%2FPDF.0
  Size: 67K

foxml:datastream --> P7M
foxml:datastreamVersion --> P7M.0
Path: /var/fedora/data/datastreamStore/55/info%3Afedora%2Funiposmi%3A6666%2FP7M%2FP7M.0
  Size: 69K

foxml:datastream --> XML
foxml:datastreamVersion --> XML.0
Path: /var/fedora/data/datastreamStore/d8/info%3Afedora%2Funiposmi%3A6666%2FXML%2FXML.0
  Size: 276



This works good if datastream_ID is "contained" in datastreamVersion_ID... e.g.

<foxml:datastream ID="P7M" STATE="A" CONTROL_GROUP="M" VERSIONABLE="false">
<foxml:datastreamVersion ID="P7M.0" [...]

Unfortunately I stuck in the code to get it work without this "trick", for this reason any kind of ideas/improvements/etc are welcome!! :)

Hope this helps too, bye!

R

On 06/06/2012 10:15, Matteo Boschini wrote:
Here's Scott's original script, with a few (naive) modifications to
get the DS path, given that you know the DS CM.

I'm trying to find a fwe spare hours (lol), in oprder to get the DS
directly from CM instead of hardcoding it in the script
Hoep this helps

########################
#!/bin/bash

#=======================================================================#
# Configuration                                                         #
#=======================================================================#

# Object root directory, as defined in fedora.fcfg
#objroot='/path/to/fedora/object/root/directory'
objroot='/var/fedora/data/objectStore'
# DataStream root directory
DSroot='/var/fedora/data/datastreamStore'
# DataStream "pattern". Depends on ContentModel.
# Hard-coded as for now...
DSCMpattern='/PDF/PDF.0'

# The awk print string matches the directory pattern defined in
# akubra-llstore.xml
# ORIG awkcmd='{ print substr($0,1,1) "/" substr($0,2,2) "/" substr($0,4,2); }'
awkcmd='{ print substr($0,1,2)}'

#=======================================================================#
# Main                                                                  #
#=======================================================================#

if [ "X$1" == "X" ]
then
    echo "Usage:  $0<fedora_object_pid>" 1>&2
    exit 1
fi

pid="$1"

path=`echo -n "info:fedora/$pid" | md5sum | awk "$awkcmd"`
DSpath=`echo -n "info:fedora/$pid${DSCMpattern}" | md5sum | awk "$awkcmd"`
echo "$DSpath"

fmtpid=`echo -n "info:fedora/$pid" | sed 's/:/%3A/g' | sed 's#/#%2F#g'`
fmtpidDS=`echo -n "info:fedora/$pid${DSCMpattern}" | sed 's/:/%3A/g' |
sed 's#/#%2F#g'`

echo "DBG $fmtpidDS"
ls -l "$objroot/$path/$fmtpid"
ls -l "$DSroot/$DSpath/$fmtpidDS"
exit
################################



On Wed, Jun 6, 2012 at 12:53 AM, Scott Prater<pra...@wisc.edu>  wrote:
Here you go, Vincent:

http://sourceforge.net/mailarchive/message.php?msg_id=29350617

As Matteo noted, this is to find just objects, not datastreams.  He
tweaked it to return the path to the datastream:

echo -n
"info:fedora/yournamespace:somePID/SOMEDATASTREAM/SOMEDATASTREAM.0" | md5sum

I agree this is a hack, but if you have access to the filesystem where
the datastreams are stored, it is a very fast solution.

-- Scott

On 06/05/2012 07:53 AM, Nguyen, Vincent (CDC/OD/OADS) (CTR) wrote:
Benjamin,

The script does have access to the Fedora managed content.  I thought about doing it this 
way as well, it would work but it does seem like a bit of a "hack".

Do you happen to have a link to the script that Scott Praeter posted to get 
this info?

Thanks everyone for your reply!

Vincent Vu Nguyen


-----Original Message-----
From: Benjamin Armintor [mailto:armin...@gmail.com]
Sent: Monday, June 04, 2012 5:02 PM
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] File size of content/datastream

Where is the script that needs this information running? Does it have access to 
the device that Fedora is storing managed content on?  I think Scott Praeter 
posted a script to this list a couple of days ago that formulated a file path 
to the managed data using the same algorithm Fedora does internally, and 
something like that might be your best option until you can upgrade to 3.4+.

- Ben

On Mon, Jun 4, 2012 at 4:57 PM, James, Eric<eric.ja...@yale.edu>    wrote:
Vincent,

You could get the size programmatically with a java method such as below.  I 
commented out the urlConn.getContentLength() as, as you found out that returns 
0., But the byte counter loop works:

public int getEADSize(String hostname,String pid) throws Exception {
         String urlString = hostname + "/fedora/get/"+pid+"/EAD";
         URL url = new URL(urlString);
         URLConnection urlConn = url.openConnection();
         //int len = urlConn.getContentLength();
         InputStream is = urlConn.getInputStream();
         int len = 0;
         while (is.read()!=-1) {
             len++;
         }
         return len;
}

-Eric
________________________________________
From: Nguyen, Vincent (CDC/OD/OADS) (CTR) [v...@cdc.gov]
Sent: Monday, June 04, 2012 4:03 PM
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] File size of content/datastream

Thanks for the Replies.

Rebecca, the getSize() returns '0' for Managed Datastreams like the ticket 
says.  We plan on upgrading but for now, we're locked in with version 3.2 for 
at least 6 months.

Kyle, unfortunately the HTTP HEAD request does the same thing.  It doesn't return 
"Content-Length".  When I print getHeaderFieldKey and getHeaderField, this is 
what I get (DS1 is the managed datastream I want to grab the info for).

DC - 417MODS - 3489THUMBNAIL_LARGE - 0THUMBNAIL_SMALL - 0RELS-EXT - 385DS1 - 
0Server HTTP version, Response code:
HTTP/1.1 200 OK

Server=Apache-Coyote/1.1
Set-Cookie=JSESSIONID=A7F1672733ECCB504E5C8ACD4A556C91; Path=/muradora
Content-Disposition=attachment; filename=demo_12679DS1.pdf
Content-Type=application/pdf Transfer-Encoding=chunked Date=Mon, 04
Jun 2012 19:58:30 GMT


The only work around I can think of is to index the 'filesize' value during 
Ingest.  Which means we'll have to reindex every object.


Vincent Vu Nguyen

-----Original Message-----
From: Rebecca Sutton Koeser [mailto:rebecca.s.koe...@emory.edu]
Sent: Monday, June 04, 2012 2:47 PM
To: Support and info exchange list for Fedora users.
Subject: Re: [fcrepo-user] File size of content/datastream

On Monday June 04, 2012 at 06:35 PM, Nguyen, Vincent (CDC/OD/OADS) (CTR) wrote:
Is there a way to get the Filesize of a Managed Datastream without
having to actually download the file?

We're on Fedora 3.2.

The datastream information provided by API-M getDatastream should include the 
size of the datastream.

However, there have been issues with datastream size in certain versions of 
Fedora, and I think managed datastreams in 3.2 might be problematic.  You can 
probably confirm by checking on some of your content.  The JIRA ticket I'm 
thinking of indicates this was fixed in Fedora 3.4:

https://jira.duraspace.org/browse/FCREPO-64


--
Rebecca Sutton Koeser, Ph.D.
rebecca.s.koe...@emory.edu
Digital Programs&    Systems - Woodruff Library, Emory University

----------------------------------------------------------------------
--------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions will include endpoint security, mobile security and the
latest in malware threats.
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
----------------------------------------------------------------------
--------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions will include endpoint security, mobile security and the
latest in malware threats.
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and threat 
landscape has changed and how IT managers can respond. Discussions will include 
endpoint security, mobile security and the latest in malware threats. 
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


--
Scott Prater
Shared Development Group
General Library System
University of Wisconsin - Madison
pra...@wisc.edu
5-5415

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

--
Dr. Riccardo Valzorio
Servizi Sistemistici, sicurezza e reti
mail: valzo...@cilea.it - skype: riccardo.valzorio
Ph: +39 02  26995.384 - mob. +39 348 1328436 - fax +39 02 2135520
CILEA - Consorzio Interuniversitario
http://www.cilea.it/disclaimer

"A computer is like air conditioning: it becomes useless when you open windows." L. Torvalds

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to