Yes, that would take a while to do since we have approximately 188,000 Items in our repository and quite a few Items have more than one bitstream. :-)
I did figure out a way to compare what was in the bitstream table with what was on the file system and, luckily we are not missing any assetstore files. Thanks for your email! Sue Sue Walker-Thornton (w): (757) 864-2368 (m): (757) 506-9903 -----Original Message----- From: Brian Freels-Stendel [mailto:[email protected]] Sent: Friday, October 26, 2012 2:07 PM To: [email protected]; Thornton, Susan M. (LARC-B702)[LITES] Cc: dspace-tech Subject: Re: [Dspace-tech] Assetstore structure question Good afternoon, If there are files in all the right places, it might be less effort to export the entire repo and then import into a fresh instance. Maybe. I know you'uns have a lot of stuff. B-- >>> On 10/26/2012 at 11:58 AM, in message <39919fc910d0004bbec8514947c345759b734f1...@ndjsscc07.ndc.nasa.gov>, "Thornton, Susan M. (LARC-B702)[LITES]" <[email protected]> wrote: > What I'm in the process of doing to validate the integrity of the restore is > this: > > 1. List all files of type -f in the /assetstore1 directory, then load it > into a temporary table with 3 columns in Postgres: > > a. Record_id (unique identifier) > > b. sub_directory (contains the 1st 6 digits of all the file names > > c. file_name (contains the entire file name > > 2. Execute a sql query that looks at all rows in the bitstream table > where store_number = 1 and see if the first 6 digits match the subdirectory > the file was found in, on assetstore1 and the file name matches the > internal_id from the bitstream table. > > 3. Identify “orphan” files in /assetstore1 by reversing the “EXISTS” > logic in step 2 above, to see if all the files found under /assetstore2 exist > in the bitstream table. > > > > I just can’t figure out why there are so many duplicates and why there are > tons of files 2 levels down from /assetstore1 that shouldn’t be there. > > > > Example: > > [cid:[email protected]] > > > > In this example, the hightlighted files at the bottom of the screen shot > shouldn’t be there. They’re all just 2 levels down and they should be in > other directories. Here’s where I’m finding the duplicates. The highlighted > files are in this incorrect subdirectory, but they’re ALSO in the correct > subdirectory. > > > > I’m likely going to end up having to write a script to delete these “orphan” > files. I thought maybe the DSpace cleanup script would take care of this, > but it doesn’t. (SUGGESTION FOR FUTURE RELEASE!!! ☺) > > > > THANKS, > > Sue Walker-Thornton > > (w): (757) 864-2368 > > (m): (757) 506-9903 > > > > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > helix84 > Sent: Thursday, October 25, 2012 5:37 PM > To: Thornton, Susan M. (LARC-B702)[LITES] > Cc: dspace-tech > Subject: Re: [Dspace-tech] Assetstore structure question > > > > On Thu, Oct 25, 2012 at 11:02 PM, Thornton, Susan M. > > (LARC-B702)[LITES] <[email protected]<mailto:[email protected]>> > wrote: > >> Hi Helix, > >> We had a hardware failure on one of our assetstore file systems recently > and I'm working on verifying the restore worked properly (it didn't as I'm > finding lots of duplicate files and files at depth level 3). > >> > >> Yes, I understand that in the bitstream table, "store_number" tells > DSpace where to find the file and the "internal_id" tells DSpace 2 things: > >> 1. which subdirectory in that store_number the file > resides (according to the first 6 digits) > >> 2. what the actual file name is > >> > >> What a mess! > >> Thanks for your reply - it was as I thought it was! > >> Sue > > > > You may be lucky (as strange as it may sound) if you're getting whole files, > which would indicate that just the filesystem was corrupted, not the contents > of bitstreams. It should be possible to find the right location for files. > > > > If I were you, I'd make a list of all the files (find /dspace/assetstore > -type f) then look up their names in the "bitstream" table, then take the > corresponding checksum and run md5sum on the bitstream to verify its > integrity. > > > > As for locations within assetstore, if the bitstream name is correct, it's > easy to find the directory by the first three numbers of bitstream filename. > > > > If bitstream filename is corrupted, it would be harder. You could make a > list of all their md5 sums and then do the lookup I mentioned in reverse - > find the bitstream filename by its md5 sum in the "bitstream" table. > > > > Good luck! Just ask if you're in doubt. > > > > Regards, > > ~~helix84 ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

