Hey Tom, -----Original Message-----
From: Thomas Bennett <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Friday, May 30, 2014 6:26 AM To: OODT <[email protected]> Subject: Re: Metadata based versioning >Hey Chris, > >Thanks for your reply. > >I think you may have clarified some of my understanding of oodt under the >hood. Woot. (or is that woodt?) Haha, woodt it is. > >Firstly, from OODT-72 I can see how the design decisions were made. It >just >so happens that I'm wanting the filename for versioning and suddenly I >understand why FinalFileLocationExtractor is needed. I will now use it >with >confidence :-). > >Your use of the term 'client side data movement' confused me at first, so >I >had to think about it a bit. I was always under the impression (a naive >misconception) that if your file manager existed on a "machine B" you >would >need to do a remote data transfer to use that file manager. > >But what you're saying is that the following setup is possible: > > - Machine A (client): crawler + repository path + local data transfer > (i.e. machine A, or the 'client' does not need a file manager running >and > does not need to remote data transfer to the machine B) > - Machine B (server): file manager (does not need the repository path >to > archive files) Yes this is totally possible. Imagine the following configuration: Machine A: no file manager, but has crawler, + can see src + dest path with local data transfer (note *local* is a misnomer, b/c through distributed file systems like NFS, Hadoop, Spark/Shark, GlusterFS, etc. we can logically mount local commodity shared nothing disk and federate them to make them appear like one big one - each of the preceding distributed file system technologies all have different strengths benefits, but from OODT's perspective, it can all be local even if it truly isn't). Machine B: file manager Use Case: Ingest a file on machine A into the File Manager on machine B. - totally doable - crawler on A contacts (by default) http://B:9000/ and then ingests into file manager using client side transfer. Make sense? > >Have I got the right idea? Yep! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-5th floor Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >On 30 May 2014 07:01, Mattmann, Chris A (3980) < >[email protected]> wrote: > >> Hey Tom, >> >> You've correctly discovered this. This was an intentional by-design >> artifact of my belief that versioning and data movement should be >> sort of co-located on the same machine. So if you do client side >> data movement (which most people do), then the versioning should >> happen alongside of it, and thus any metadata extraction present >> there should be available during versioning for use in e.g., Metadata >> based versioning. >> >> The rub comes in the issue where the metadata is generated on the >> server side and you expect versioning to be available to the system. >> One way of getting around this is taking a look at the way that >> the FinalFileLocationExtractor [1] grabs the latest version of the >> CoreMetKeys.FILE_LOCATION property and then makes it available for e.g., >> versioning. >> >> See discussion too in OODT-72 [2] for some rationale behind my >> sentiments there. Happy to discuss! >> >> Cheers, >> Chris >> >> [1] http://s.apache.org/bvd >> [2] https://issues.apache.org/jira/browse/OODT-72 >> >>
