Hi Konstantinos, This may be a long shot. But, I may have come up with a few things you could look at to try and solve your issue. I am working with version 0.10, so the files and their locations that I am going to reference in this email pertain only to 0.10. So, keep that in mind if I reference a file that you can't find. Because there could have been a change made between 0.10 and 0.12 that I haven't looked at yet.
The first thing I notice is the filename you are using for your mime types 'mimetypes.xml'. I know that the filename you use shouldn't make difference as long as all the references to the file are the same. But, there are many references to the mime type file throughout the system, and, depending on which original *.xml files you based your system on, it can be very easy to have one of those references set to something different than the others. If you look in the filemgr/etc directory, the default name for the mime types file is 'mime-types.xml'. If you look in the filemgr/etc/filemgr.properties file, there is a property setting that implies the default filename is 'mime-types.xml' as in: # location of Mime-Type repository org.apache.oodt.cas.filemgr.mime.type.repository=/path/to/mime-types.xml If you look in the example mime-extractor-map.xml file in the pge/etc/examples directory, the mime repository is set to 'mime-types.xml' as in: <cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="false" mimeRepo="mime-types.xml"> If you look in the crawler/policy directory, there is a default mime types file named 'mimetypes.xml', but the default mime-extractor-map.xml file in that same directory sets the mime repository to 'path/to/tika-mimetypes/xml/file', as in: <cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="true or false" mimeRepo="path/to/tika-mimetypes/xml/file"> In addition, if you download the source code for the 'metadata' component, and look in the metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java file, it sets the default name of the mime types file to 'tika-mimetypes.xml' as in this line of code: public final static String MIME_FILE_RES_PATH = "tika-mimetypes.xml"; So, the first thing you should do is make sure all of your references to your mime types file are the same. There are several places ( or in several classes) where the MimeTypeUtil class is used, and you need to make sure that each instantiation of the class is using the same mime types file. A quick search of the source code revealed that MimeTypeUtils is referenced in the following places: ./crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java ./pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java ./protocol/http/src/main/java/org/apache/oodt/cas/protocol/http/util/HttpUtils.java ./metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java ./metadata/src/main/java/org/apache/oodt/cas/metadata/preconditions/MimeTypeComparator.java ./metadata/src/test/org/apache/oodt/cas/metadata/util/TestMimeTypeUtils.java For example, the MimeTypeComparator.java class has a method called setMimeTypeRepo to set the mime repository name, but there is no code in the system that actually calls MimeTypeComparator::setMimeTypeRepo, so, if you are using the MimeTypeComparator as one of your preconditions, then, MimeTypeUtils was instantiated with its default constructor which then sets its internal mime repository to MIME_FILE_RES_PATH shown above, which is probably not what you want because your custom mime type is not in that file. And then you can get the 'no extractor defined' error. The second thing I noticed is how you are defining your custom mime types. <mime-type type="text/fastq"> <glob pattern="*.fastq"/> <glob pattern="*.fastq.gz"/> <glob pattern="*.fastq.bz"/> <glob pattern="*.fastq.bz2"/> <glob pattern="*.fastq.bzip"/> <glob pattern="*.fq"/> <glob pattern="*.fq.gz"/> <glob pattern="*.fq.bz"/> <glob pattern="*.fq.bz2"/> <glob pattern="*.fq.bzip"/> </mime-type> I had to make a change to how I was defining my mime types. I don't think Tika will like the way you have defined your mime types. For example, I have a mime type called "product/fei-ecsv" which are just text files named *.ecsv. I had defined it like this: <mime-type type="product/fei-ecsv"> <glob pattern="*.ecsv"/> </mime-type> If I remember correctly, I think Tika ended up not being able to determine the mime type and it defaulted to 'application/octet-stream' - for which I did not have an extractor defined, and so I got the 'no extractor defined' errors. So, in order to get Tika to recognize my new mime type, I had to add the 'sub-class-of' tag and change my definition to: <mime-type type="product/fei-ecsv"> <sub-class-of type="text/plain"/> <glob pattern="*.ecsv"/> </mime-type> I also ran into a problem when I tried to define a mime type for files that have an extension that was already defined in the mime types file, even if it was a two part extension that didn't actually exist in the file. For example, I am a little worried you might run into problems with your patterns that end in .gz, .bz, .bz2 and .bzip even though they also have '.fq' and '.fastq' in the pattern. You might have to split all of your patterns up into a few different mime types. I hope that you won't have to. But if you do, then I pretty sure these 4 types will work as far as Tika is concerned. But doing this might screw up how you have set up your "product types'. <mime-type type="text/fastq"> <sub-class-of type="text/plain"/> <glob pattern="*.fastq"/> <glob pattern="*.fq "/> </mime-type> <mime-type type="text/fastq-gz"> <sub-class-of type="application/gzip"/> <glob pattern="*.fastq.gz "/> <glob pattern="*.fq.gz "/> </mime-type> <mime-type type="text/fastq-bz"> <sub-class-of type="application/x-bzip"/> <glob pattern="*.fastq.bz"/> <glob pattern="*.fastq.bzip"/> <glob pattern="*.fq.bz"/> <glob pattern="*.fq.bzip"/> </mime-type> <mime-type type="text/fastq-bz2"> <sub-class-of type="application/x-bzip2"/> <glob pattern="*.fastq.bz2"/> <glob pattern="*.fq.bz2"/> </mime-type> I hope this helps! Please let me now if yo have any questions. I spent a huge amount of time debugging the 'no extractor found' error, so I have spent a huge amount of time upgrading to each new version from 0.6 to 0.10, so I'm hoping my struggles can help someone else :) Val Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Konstantinos Mavrommatis [mailto:[email protected]] > Sent: Wednesday, April 06, 2016 9:48 PM > To: [email protected] > Subject: RE: Transition from OODT 0.6 to 0.12 cannot find extractor > specifications > > I am giving up on this.... > I had used [1] in the first place to setup oodt (v0.6 back then) my setup in > the new > system is identical to the old one. > I could not make much out of [0]. Among other things I tried to copy the > files in the > old crawler/policy to the new crawler/policy - which included some > legacy-cmd-line- > options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full > oodt on > the client side, but still did not work. > > I ended up reverting to the older version (0.6) which I run on my client. The > server > (which runs FM) is still 0.12, but the combination seems to be working fine. > > K > > -----Original Message----- > From: Lewis John Mcgibbney [mailto:[email protected]] > Sent: Tuesday, April 05, 2016 3:33 AM > To: [email protected] > Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor > specifications > > Hi K, > OK so I did a bit of searching here and located a bunch of files which are > defined > as legacy... you can check the search results out below > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q- > 3DAutoDetectProductCrawler-26type- > 3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- > T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- > CubVmid0OEJbXqk4G2cmzDs&s=B33E_m- > BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e= > I would urge you to have a look at the AutoDetectProductCrawler Javadoc > description included in master branch [0] as well to see if you've got > everything > required. > Finally, I came across some documentation on the wiki which may guide you in > the > right direction [1]. It may also be outdated though so please let us know if > that it > the case. > hth > > [0] > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb3 > 9c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawle > r.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- > T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- > CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcx > XiLWwT4&e= > [1] > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection- > 2Bwith-2Bthe- > 2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- > T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- > CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutE > wICmGs&e= > > On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < > [email protected]> wrote: > > > Hi, > > It seems to be happening for a number of types of files that I have in > > the mimetypes.xml. > > A few things are puzzling to me: this file which is a .gz file is not > > processed by the regular tika mimetypes which contains the gzip files > > A file that has no extension, which defaults to txt is passed to the > > MetExtractor.pl and processed. > > > > Any ideas I can find what are the preconditions that fail ? I tried to > > change the log level to DEBUG for all components but I did not get > > much more information. This must be something that changed in the OODT > > releases > > >0.6 but could not find anything relevant in the release notes. > > I also noticed in the documentation of the AutoDecectProductCrawler > > that it uses the file met-extr-preconditions.xml which I could not > > find anywhere in the deployed OODT or the src directories. Could that > > be a reason for the problem I observe? > > > > Thanks > > K > > > > -----Original Message----- > > From: Lewis John Mcgibbney [mailto:[email protected]] > > Sent: Monday, April 04, 2016 3:24 PM > > To: [email protected] > > Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor > > specifications > > > > Hi Konstantinos, > > It appears to be happening with a tar.gz file as well right? > > > > WARNING: No extractor specs specified for > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast > > q/cas-crawler-04-02-16.log.gz > > > > I wonder if it is the file names... However I would be extremely > > surprised as I've seen some much more verbose file naming. > > Lewis > > > > On Saturday, April 2, 2016, Konstantinos Mavrommatis < > > [email protected]> wrote: > > > > > Hi, > > > I am trying to replicate a fully functional service that I had setup > > > long time ago using OODT 0.6 but I am having the following problem > > > that does not allow me to ingest files. When I try to ingest files > > > with the extension fastq.gz I get the line: > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > And of course the file is not ingested. This process works without > > > problem with OODT 0.6 on a different server. > > > > > > The crawler command I am running is: > > > ./crawler_launcher \ > > > --operation \ > > > --launchAutoCrawler \ > > > --productPath $FILEPATH \ > > > --filemgrUrl $OODT_FILEMGR_URL \ > > > --clientTransferer > > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > > > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ > > > --crawlForDirs 2>&1 > > > > > > > > > > > > I have setup OODT 0.12 on a server which runs FM listening to port 9000. > > > From a client machine I have verified that I can use FM to ingest > > products. > > > I am now trying to use crawler to crawl and ingest all files in a > > > directory. Since I have non standard MIME types in these directories > > > I have done the following: > > > 1. Added my own mime types in policy/mimetypes.xml eg > > > <mime-type type="text/fastq"> > > > <glob pattern="*.fastq"/> > > > <glob pattern="*.fastq.gz"/> > > > <glob pattern="*.fastq.bz"/> > > > <glob pattern="*.fastq.bz2"/> > > > <glob pattern="*.fastq.bzip"/> > > > <glob pattern="*.fq"/> > > > <glob pattern="*.fq.gz"/> > > > <glob pattern="*.fq.bz"/> > > > <glob pattern="*.fq.bz2"/> > > > <glob pattern="*.fq.bzip"/> > > > </mime-type> > > > 2. created the file policy/mime-extractor-map.xml > > > > > > <mime type="text/fastq"> > > > <extractor > > > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor"> > > > <config > > > file="/apache-oodt/crawler/bin/fastq.config"/> > > > <preCondComparators> > > > <preCondComparator > > > id="CheckThatDataFileSizeIsGreaterThanZero"/> > > > </preCondComparators> > > > </extractor> > > > </mime> > > > > > > 3. created the file fastq.config > > > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor > > > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http- > 3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- > T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- > CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M > _KF06w&e= "> > > > <exec workingDir=""> > > > > > > > > <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract > > orBinPath> > > > <args> > > > <arg isDataFile="true"></arg> > > > <arg>fastq</arg> > > > </args> > > > </exec> > > > </cas:externextractor> > > > > > > > > > > > > The MetExtractorNGS.pl is a small perl script that opens the file to > > > be ingested, gets some information and stores it in the .met file > > > that corresponds to the file to be ingested and have manually > > > verified that works as expected producing the correct met file. > > > > > > What am I missing here? Any ideas comments suggestions will be > > > greatly appreciated. > > > Thanks in advance for any help > > > Kostas > > > > > > > > > > > > PS1 The full output from running the crawler command follows: > > > > > > > > > Setting property 'StdProductCrawler.filemgrUrl' > > > Setting property 'MetExtractorProductCrawler.filemgrUrl' > > > Setting property 'AutoDetectProductCrawler.filemgrUrl' > > > Setting property 'StdProductCrawler.clientTransferer' > > > Setting property 'MetExtractorProductCrawler.clientTransferer' > > > Setting property 'AutoDetectProductCrawler.clientTransferer' > > > Setting property 'StdProductCrawler.noRecur' > > > Setting property 'MetExtractorProductCrawler.noRecur' > > > Setting property 'AutoDetectProductCrawler.noRecur' > > > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo' > > > Setting property 'StdProductCrawler.productPath' > > > Setting property 'MetExtractorProductCrawler.productPath' > > > Setting property 'AutoDetectProductCrawler.productPath' > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value > > > [true] Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'StdProductCrawler.productPath' set to value > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value > > > [true] Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to > > > value [../policy/mime-extractor-map.xml] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to > > > value > > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > > > ] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [ > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 > > > 00 > > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C > > > s- > > > > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov > pwZVR1 > > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to > > > value > > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > > > ] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr > > > 02, > > > 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [ > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 > > > 00 > > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C > > > s- > > > > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov > pwZVR1 > > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'AutoDetectProductCrawler.productPath' set to value > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value > > > [ > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 > > > 00 > > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C > > > s- > > > > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov > pwZVR1 > > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'StdProductCrawler.clientTransferer' set to value > > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > > > ] > > > Apr 02, 2016 10:12:13 PM > > > org.springframework.beans.factory.config.PropertyOverrideConfigurer > > > processKey > > > FINE: Property 'MetExtractorProductCrawler.productPath' set to value > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as tq] Apr 02, 2016 10:12:13 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > crawl > > > INFO: Crawling > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q Apr 02, 2016 10:12:13 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R1.fastq.gz > > > Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler > > > passesPreconditions > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > WARNING: Failed to pass preconditions for ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R1.fastq.gz.met > > > Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval > > > INFO: Passed precondition comparator id > > > CheckThatDataFileSizeIsGreaterThanZero > > > Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Generating met file for product file: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/E837642_R1.fastq.gz.met] > > > Apr 02, 2016 10:12:14 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Executing command line: > > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R1.fastq.gz.met > > > text ] with workingDir: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq] > > > to extract metadata > > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not > > > processed ! > > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > SEVERE: Failed to get metadata for product : Met extractor failed to > > > create metadata file > > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met > > > extractor failed to create metadata file > > > at > > > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat > > a(ExternMetExtractor.java:120) > > > at > > > > > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst > > ractMetExtractor.java:74) > > > at > > > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu > > ct(AutoDetectProductCrawler.java:84) > > > at > > > > > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav > > a:136) > > > at > > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) > > > at > > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) > > > at > > > > > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( > > CrawlerLauncherCliAction.java:58) > > > at > > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) > > > at > > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) > > > at > > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: > > > 36 > > > ) > > > > > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R2.fastq.gz > > > Apr 02, 2016 10:12:15 PM > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler > > > passesPreconditions > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > WARNING: Failed to pass preconditions for ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R2.fastq.gz.met > > > Apr 02, 2016 10:12:15 PM > > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval > > > INFO: Passed precondition comparator id > > > CheckThatDataFileSizeIsGreaterThanZero > > > Apr 02, 2016 10:12:16 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Generating met file for product file: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/E837642_R2.fastq.gz.met] > > > Apr 02, 2016 10:12:16 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Executing command line: > > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/E837642_R2.fastq.gz.met > > > text ] with workingDir: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq] > > > to extract metadata > > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not > > > processed ! > > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > SEVERE: Failed to get metadata for product : Met extractor failed to > > > create metadata file > > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met > > > extractor failed to create metadata file > > > at > > > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat > > a(ExternMetExtractor.java:120) > > > at > > > > > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst > > ractMetExtractor.java:74) > > > at > > > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu > > ct(AutoDetectProductCrawler.java:84) > > > at > > > > > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav > > a:136) > > > at > > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) > > > at > > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) > > > at > > > > > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( > > CrawlerLauncherCliAction.java:58) > > > at > > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) > > > at > > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) > > > at > > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: > > > 36 > > > ) > > > > > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/cas-crawler-04-02-16.log.gz > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler > > > passesPreconditions > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > WARNING: Failed to pass preconditions for ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/cas-crawler-04-02-16.tar.gz > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler > > > passesPreconditions > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > WARNING: Failed to pass preconditions for ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S > > > eq > > > -RawData-fastq-04-02-16.tar.gz > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler > > > passesPreconditions > > > WARNING: No extractor specs specified for > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S > > > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > WARNING: Failed to pass preconditions for ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA- > > > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Handling file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/test > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval > > > INFO: Passed precondition comparator id > > > CheckThatDataFileSizeIsGreaterThanZero > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Generating met file for product file: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/test] > > > Apr 02, 2016 10:12:17 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Executing command line: > > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/test > > > text ] with workingDir: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq] > > > to extract metadata > > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing > > > NGS server at > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8 > > > 08 > > > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6 > > > yv > > > Z1Cs- > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tS > c > > > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e= > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > metadata for file_host are not in array format.Converting.. > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > adding key/value [file_host]/[ip-192-168-8-66] > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > metadata for ProductType are not in array format.Converting.. > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > adding key/value [ProductType]/[GenericFile] > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > metadata for ingest_user are not in array format.Converting.. > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > adding key/value [ingest_user]/[kmavrommatis] > > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file > > > path is ARRAY(0x22d3f48). It will be added under the FilePath > > > metadata field > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > metadata for FilePath are not in array format.Converting.. > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: > > > adding key/value > > > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se > > > q/ > > > RawData/fastq/test] > > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file > > > is of type text > > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing > > > metadata in file > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/test.met > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/test > > > to > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - > > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa > > > st > > > q/test > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing > > > kmavrommatis to > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - > > > kmavrommatis > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing > > > GenericFile to > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - > > > GenericFile > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing > > > ip-192-168-8-66 to > > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - > > > ip-192-168-8-66 > > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process > > > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor > > > extrMetadata > > > INFO: Met extraction successful for product file: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/test] Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.crawl.ProductCrawler ingest > > > INFO: ProductCrawler: Ready to ingest product: > > > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA- > Seq/RawData/fastq/test]: > > > ProductType: [GenericFile] > > > Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.filemgr.ingest.StdIngester > > > setFileManager > > > INFO: StdIngester: connected to file manager: > > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A > > > 90 > > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 > > > Cs > > > - > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov > pwZVR > > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer > > > setFileManagerUrl > > > INFO: In Place Data Transfer to: > > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A > > > 90 > > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- > Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 > > > Cs > > > - > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov > pwZVR > > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 > > > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester > > > ingest > > > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType: > > > [GenericFile]: FileLocation: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/] > > > Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient > > > ingestProduct > > > FINEST: File Manager Client: clientTransfer enabled: transfering > > > product [test] Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.filemgr.versioning.VersioningUtils > > > createBasicDataStoreRefsFlat > > > FINE: VersioningUtils: Generated data store ref: > > > file:/opt/oodt/data/archive/test/test from origRef: > > > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa > > > ta /fastq/test Apr 02, 2016 10:12:19 PM > > > org.apache.oodt.cas.crawl.ProductCrawler ingest > > > INFO: Successfully ingested product: > > > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA- > Seq/RawData/fastq/test]: > > > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc > > > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler > > > handleFile > > > INFO: Successful ingest of product: > > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f > > > as > > > tq/test] > > > > > > > > > ********************************************************* > > > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS > CONFIDENTIAL AND > > > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR > THE USE > > > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. > > > If the reader is not the intended recipient, or the employee or > > > agent responsible to deliver it to the intended recipient, you are > > > hereby notified that any dissemination, distribution or copying of > > > this communication is strictly prohibited. If you have received this > > > communication in error, please reply to the sender to notify us of > > > the error and delete the original message. Thank You. > > > > > > > > > -- > > *Lewis* > > > > ********************************************************* > > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS > CONFIDENTIAL AND > > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR > THE USE > > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. > > If the reader is not the intended recipient, or the employee or agent > > responsible to deliver it to the intended recipient, you are hereby > > notified that any dissemination, distribution or copying of this > > communication is strictly prohibited. If you have received this > > communication in error, please reply to the sender to notify us of the > > error and delete the original message. Thank You. > > > > > > -- > *Lewis* > > ********************************************************* > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL > AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY > FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. > If the reader is not the intended recipient, or the employee or agent > responsible to > deliver it to the intended recipient, you are hereby notified that any > dissemination, > distribution or copying of this communication is strictly prohibited. If you > have > received this communication in error, please reply to the sender to notify us > of the > error and delete the original message. Thank You.
