Hi Zichen, Answers below:
-----Original Message----- From: Zichen Nie <[email protected]> Date: Monday, November 3, 2014 at 10:32 AM To: Chris Mattmann <[email protected]> Subject: Re: TikaCmdLineMetExtractor does not generate .met file >Yes. That's totally make sense! But here comes the question, >1. Do we have to generate .met file for our homework 2? How are we make >use of .met files? You only have to generate .met files if you need them to crawl - if you are using MetExtractorProductCrawler then no you don’t need met file. >2. How to customize the metadata files we want to create? In the cas-pge >example, I don't understand why there are key and value pairs like "less >than" and "%3C" in the final met file. The corresponding metadata >configuration specifies:<customMetadata><metadata > key="LessThan" val="<"/>...<customMetadata>. But if I add some >other key on my own, it fails to show in the final met file. Please read: https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+ Metadata+during+PGE+based+Processing https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Meta data+Precendence > >3. When I tried to create my own workflow using cas-pge, for example, >just copy a file to another place, the terminal said "ingestion is >failed" because of "missing required metadata", however, in my >destination folder the new file is copied and .met is > generated. There is only one key value pairs in my .met which is JobID. >I am really confused. requiredMetadata is defined in the workflow manager tasks.xml on a per task basis. Please refer to the requiredMetadata for your task and confirm that you provided it. > > >I must be missing something in the configuration process, the >message"missing required metadata" I saw a lot of times and even through >successful ingestions. Any suggestions? See above. Cheers, Chris > > >Best, >Zichen ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > >2014-11-02 19:31 GMT-08:00 Mattmann, Chris A (3980) ><[email protected]>: > >Hi Zichen, > >Thanks for your mail. If you use MetExtractorProductCrawler, met >is generated, but it¹s never serialized to disk. I think that explains >it. Let me know if that makes sense. > >Cheers, >Chris > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > >-----Original Message----- >From: Zichen Nie <[email protected]> >Date: Saturday, November 1, 2014 at 4:47 PM >To: Chris Mattmann <[email protected]> >Subject: TikaCmdLineMetExtractor does not generate .met file > >>Dear Professor: >> >>I followed the instruction on how to use OODT cas-crawler, and tried to >>generate .met file using TikaCmdLineExtractor. >>I can see from the log that Tika is extracting my metadata but it does >>not generate .met file for my json file. >> >>Here is my command line: >> >> >>./crawler_launcher --operation --launchMetCrawler -filemgrUrl >>http://localhost:9000 <http://localhost:9000> --clientTransferer >>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory >>--productPath >>/Users/threeears/Documents/572/Assignment2/oodt-deploy/cas-crawler-0.7/da >>t >>a/test/0.json --metExtractor >>org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor >> --metExtractorConfig >>/Users/threeears/Documents/572/Assignment2/oodt-deploy/cas-crawler-0.7/ex >>t >>ractors/tikaextractor/tikaextractor.config --metFileExtension met >> >> >> >>I thought MetCrawler should generate meta file before ingestion, it's >>weird that my ingestion is successful and met file is not shown. Am I >>using the right extractor and crawler? Are there any necessary >>configurations that I missed? >> >> >>Best, >>Zichen >> > > > > > > > >
