*bravo Val* comprehensive and insightful. Tom, or someone we need to get this on the wiki in an FAQ..
— Chris Mattmann chris.mattm...@gmail.com On 4/7/16, 9:16 AM, "Mallder, Valerie" <valerie.mall...@jhuapl.edu> wrote: >Hi Konstantinos, > >This may be a long shot. But, I may have come up with a few things you could >look at to try and solve your issue. I am working with version 0.10, so the >files and their locations that I am going to reference in this email pertain >only to 0.10. So, keep that in mind if I reference a file that you can't find. >Because there could have been a change made between 0.10 and 0.12 that I >haven't looked at yet. > >The first thing I notice is the filename you are using for your mime types >'mimetypes.xml'. I know that the filename you use shouldn't make difference >as long as all the references to the file are the same. But, there are many >references to the mime type file throughout the system, and, depending on >which original *.xml files you based your system on, it can be very easy to >have one of those references set to something different than the others. > >If you look in the filemgr/etc directory, the default name for the mime types >file is 'mime-types.xml'. > >If you look in the filemgr/etc/filemgr.properties file, there is a property >setting that implies the default filename is 'mime-types.xml' as in: > ># location of Mime-Type repository >org.apache.oodt.cas.filemgr.mime.type.repository=/path/to/mime-types.xml > >If you look in the example mime-extractor-map.xml file in the pge/etc/examples >directory, the mime repository is set to 'mime-types.xml' as in: > ><cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="false" >mimeRepo="mime-types.xml"> > >If you look in the crawler/policy directory, there is a default mime types >file named 'mimetypes.xml', but the default mime-extractor-map.xml file in >that same directory sets the mime repository to >'path/to/tika-mimetypes/xml/file', as in: > ><cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="true or >false" mimeRepo="path/to/tika-mimetypes/xml/file"> > >In addition, if you download the source code for the 'metadata' component, and >look in the >metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java >file, it sets the default name of the mime types file to 'tika-mimetypes.xml' >as in this line of code: > >public final static String MIME_FILE_RES_PATH = "tika-mimetypes.xml"; > > >So, the first thing you should do is make sure all of your references to your >mime types file are the same. There are several places ( or in several >classes) where the MimeTypeUtil class is used, and you need to make sure that >each instantiation of the class is using the same mime types file. > >A quick search of the source code revealed that MimeTypeUtils is referenced in >the following places: >./crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java >./pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java >./protocol/http/src/main/java/org/apache/oodt/cas/protocol/http/util/HttpUtils.java >./metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java >./metadata/src/main/java/org/apache/oodt/cas/metadata/preconditions/MimeTypeComparator.java >./metadata/src/test/org/apache/oodt/cas/metadata/util/TestMimeTypeUtils.java > >For example, the MimeTypeComparator.java class has a method called >setMimeTypeRepo to set the mime repository name, but there is no code in the >system that actually calls MimeTypeComparator::setMimeTypeRepo, so, if you are >using the MimeTypeComparator as one of your preconditions, then, MimeTypeUtils >was instantiated with its default constructor which then sets its internal >mime repository to MIME_FILE_RES_PATH shown above, which is probably not what >you want because your custom mime type is not in that file. And then you can >get the 'no extractor defined' error. > >The second thing I noticed is how you are defining your custom mime types. > ><mime-type type="text/fastq"> > <glob pattern="*.fastq"/> > <glob pattern="*.fastq.gz"/> > <glob pattern="*.fastq.bz"/> > <glob pattern="*.fastq.bz2"/> > <glob pattern="*.fastq.bzip"/> > <glob pattern="*.fq"/> > <glob pattern="*.fq.gz"/> > <glob pattern="*.fq.bz"/> > <glob pattern="*.fq.bz2"/> > <glob pattern="*.fq.bzip"/> ></mime-type> > >I had to make a change to how I was defining my mime types. I don't think Tika >will like the way you have defined your mime types. For example, I have a >mime type called "product/fei-ecsv" which are just text files named *.ecsv. I >had defined it like this: > ><mime-type type="product/fei-ecsv"> > <glob pattern="*.ecsv"/> ></mime-type> > >If I remember correctly, I think Tika ended up not being able to determine the >mime type and it defaulted to 'application/octet-stream' - for which I did >not have an extractor defined, and so I got the 'no extractor defined' errors. >So, in order to get Tika to recognize my new mime type, I had to add the >'sub-class-of' tag and change my definition to: > ><mime-type type="product/fei-ecsv"> > <sub-class-of type="text/plain"/> > <glob pattern="*.ecsv"/> ></mime-type> > >I also ran into a problem when I tried to define a mime type for files that >have an extension that was already defined in the mime types file, even if it >was a two part extension that didn't actually exist in the file. For example, >I am a little worried you might run into problems with your patterns that end >in .gz, .bz, .bz2 and .bzip even though they also have '.fq' and '.fastq' in >the pattern. You might have to split all of your patterns up into a few >different mime types. I hope that you won't have to. But if you do, then I >pretty sure these 4 types will work as far as Tika is concerned. But doing >this might screw up how you have set up your "product types'. > ><mime-type type="text/fastq"> > <sub-class-of type="text/plain"/> > <glob pattern="*.fastq"/> > <glob pattern="*.fq "/> ></mime-type> > ><mime-type type="text/fastq-gz"> > <sub-class-of type="application/gzip"/> > <glob pattern="*.fastq.gz "/> > <glob pattern="*.fq.gz "/> ></mime-type> > ><mime-type type="text/fastq-bz"> > <sub-class-of type="application/x-bzip"/> > <glob pattern="*.fastq.bz"/> > <glob pattern="*.fastq.bzip"/> > <glob pattern="*.fq.bz"/> > <glob pattern="*.fq.bzip"/> ></mime-type> > ><mime-type type="text/fastq-bz2"> > <sub-class-of type="application/x-bzip2"/> > <glob pattern="*.fastq.bz2"/> > <glob pattern="*.fq.bz2"/> ></mime-type> > > >I hope this helps! Please let me now if yo have any questions. I spent a >huge amount of time debugging the 'no extractor found' error, so I have spent >a huge amount of time upgrading to each new version from 0.6 to 0.10, so I'm >hoping my struggles can help someone else :) > >Val > > > >Valerie A. Mallder >New Horizons Deputy Mission System Engineer >Johns Hopkins University/Applied Physics Laboratory > > >> -----Original Message----- >> From: Konstantinos Mavrommatis [mailto:kmavromma...@celgene.com] >> Sent: Wednesday, April 06, 2016 9:48 PM >> To: dev@oodt.apache.org >> Subject: RE: Transition from OODT 0.6 to 0.12 cannot find extractor >> specifications >> >> I am giving up on this.... >> I had used [1] in the first place to setup oodt (v0.6 back then) my setup in >> the new >> system is identical to the old one. >> I could not make much out of [0]. Among other things I tried to copy the >> files in the >> old crawler/policy to the new crawler/policy - which included some >> legacy-cmd-line- >> options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full >> oodt on >> the client side, but still did not work. >> >> I ended up reverting to the older version (0.6) which I run on my client. >> The server >> (which runs FM) is still 0.12, but the combination seems to be working fine. >> >> K >> >> -----Original Message----- >> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] >> Sent: Tuesday, April 05, 2016 3:33 AM >> To: dev@oodt.apache.org >> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor >> specifications >> >> Hi K, >> OK so I did a bit of searching here and located a bunch of files which are >> defined >> as legacy... you can check the search results out below >> https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q- >> 3DAutoDetectProductCrawler-26type- >> 3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- >> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- >> CubVmid0OEJbXqk4G2cmzDs&s=B33E_m- >> BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e= >> I would urge you to have a look at the AutoDetectProductCrawler Javadoc >> description included in master branch [0] as well to see if you've got >> everything >> required. >> Finally, I came across some documentation on the wiki which may guide you in >> the >> right direction [1]. It may also be outdated though so please let us know if >> that it >> the case. >> hth >> >> [0] >> https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb3 >> 9c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawle >> r.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- >> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- >> CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcx >> XiLWwT4&e= >> [1] >> https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection- >> 2Bwith-2Bthe- >> 2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- >> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- >> CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutE >> wICmGs&e= >> >> On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < >> kmavromma...@celgene.com> wrote: >> >> > Hi, >> > It seems to be happening for a number of types of files that I have in >> > the mimetypes.xml. >> > A few things are puzzling to me: this file which is a .gz file is not >> > processed by the regular tika mimetypes which contains the gzip files >> > A file that has no extension, which defaults to txt is passed to the >> > MetExtractor.pl and processed. >> > >> > Any ideas I can find what are the preconditions that fail ? I tried to >> > change the log level to DEBUG for all components but I did not get >> > much more information. This must be something that changed in the OODT >> > releases >> > >0.6 but could not find anything relevant in the release notes. >> > I also noticed in the documentation of the AutoDecectProductCrawler >> > that it uses the file met-extr-preconditions.xml which I could not >> > find anywhere in the deployed OODT or the src directories. Could that >> > be a reason for the problem I observe? >> > >> > Thanks >> > K >> > >> > -----Original Message----- >> > From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] >> > Sent: Monday, April 04, 2016 3:24 PM >> > To: dev@oodt.apache.org >> > Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor >> > specifications >> > >> > Hi Konstantinos, >> > It appears to be happening with a tar.gz file as well right? >> > >> > WARNING: No extractor specs specified for >> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast >> > q/cas-crawler-04-02-16.log.gz >> > >> > I wonder if it is the file names... However I would be extremely >> > surprised as I've seen some much more verbose file naming. >> > Lewis >> > >> > On Saturday, April 2, 2016, Konstantinos Mavrommatis < >> > kmavromma...@celgene.com> wrote: >> > >> > > Hi, >> > > I am trying to replicate a fully functional service that I had setup >> > > long time ago using OODT 0.6 but I am having the following problem >> > > that does not allow me to ingest files. When I try to ingest files >> > > with the extension fastq.gz I get the line: >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > And of course the file is not ingested. This process works without >> > > problem with OODT 0.6 on a different server. >> > > >> > > The crawler command I am running is: >> > > ./crawler_launcher \ >> > > --operation \ >> > > --launchAutoCrawler \ >> > > --productPath $FILEPATH \ >> > > --filemgrUrl $OODT_FILEMGR_URL \ >> > > --clientTransferer >> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> > > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ >> > > --crawlForDirs 2>&1 >> > > >> > > >> > > >> > > I have setup OODT 0.12 on a server which runs FM listening to port 9000. >> > > From a client machine I have verified that I can use FM to ingest >> > products. >> > > I am now trying to use crawler to crawl and ingest all files in a >> > > directory. Since I have non standard MIME types in these directories >> > > I have done the following: >> > > 1. Added my own mime types in policy/mimetypes.xml eg >> > > <mime-type type="text/fastq"> >> > > <glob pattern="*.fastq"/> >> > > <glob pattern="*.fastq.gz"/> >> > > <glob pattern="*.fastq.bz"/> >> > > <glob pattern="*.fastq.bz2"/> >> > > <glob pattern="*.fastq.bzip"/> >> > > <glob pattern="*.fq"/> >> > > <glob pattern="*.fq.gz"/> >> > > <glob pattern="*.fq.bz"/> >> > > <glob pattern="*.fq.bz2"/> >> > > <glob pattern="*.fq.bzip"/> >> > > </mime-type> >> > > 2. created the file policy/mime-extractor-map.xml >> > > >> > > <mime type="text/fastq"> >> > > <extractor >> > > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor"> >> > > <config >> > > file="/apache-oodt/crawler/bin/fastq.config"/> >> > > <preCondComparators> >> > > <preCondComparator >> > > id="CheckThatDataFileSizeIsGreaterThanZero"/> >> > > </preCondComparators> >> > > </extractor> >> > > </mime> >> > > >> > > 3. created the file fastq.config >> > > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor >> > > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http- >> 3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs- >> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm- >> CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M >> _KF06w&e= "> >> > > <exec workingDir=""> >> > > >> > > >> > <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract >> > orBinPath> >> > > <args> >> > > <arg isDataFile="true"></arg> >> > > <arg>fastq</arg> >> > > </args> >> > > </exec> >> > > </cas:externextractor> >> > > >> > > >> > > >> > > The MetExtractorNGS.pl is a small perl script that opens the file to >> > > be ingested, gets some information and stores it in the .met file >> > > that corresponds to the file to be ingested and have manually >> > > verified that works as expected producing the correct met file. >> > > >> > > What am I missing here? Any ideas comments suggestions will be >> > > greatly appreciated. >> > > Thanks in advance for any help >> > > Kostas >> > > >> > > >> > > >> > > PS1 The full output from running the crawler command follows: >> > > >> > > >> > > Setting property 'StdProductCrawler.filemgrUrl' >> > > Setting property 'MetExtractorProductCrawler.filemgrUrl' >> > > Setting property 'AutoDetectProductCrawler.filemgrUrl' >> > > Setting property 'StdProductCrawler.clientTransferer' >> > > Setting property 'MetExtractorProductCrawler.clientTransferer' >> > > Setting property 'AutoDetectProductCrawler.clientTransferer' >> > > Setting property 'StdProductCrawler.noRecur' >> > > Setting property 'MetExtractorProductCrawler.noRecur' >> > > Setting property 'AutoDetectProductCrawler.noRecur' >> > > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo' >> > > Setting property 'StdProductCrawler.productPath' >> > > Setting property 'MetExtractorProductCrawler.productPath' >> > > Setting property 'AutoDetectProductCrawler.productPath' >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value >> > > [true] Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'StdProductCrawler.productPath' set to value >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value >> > > [true] Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to >> > > value [../policy/mime-extractor-map.xml] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to >> > > value >> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> > > ] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [ >> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >> > > 00 >> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >> > > s- >> > > >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov >> pwZVR1 >> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to >> > > value >> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> > > ] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr >> > > 02, >> > > 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [ >> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >> > > 00 >> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >> > > s- >> > > >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov >> pwZVR1 >> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'AutoDetectProductCrawler.productPath' set to value >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value >> > > [ >> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >> > > 00 >> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >> > > s- >> > > >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov >> pwZVR1 >> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'StdProductCrawler.clientTransferer' set to value >> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> > > ] >> > > Apr 02, 2016 10:12:13 PM >> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer >> > > processKey >> > > FINE: Property 'MetExtractorProductCrawler.productPath' set to value >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as tq] Apr 02, 2016 10:12:13 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > crawl >> > > INFO: Crawling >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q Apr 02, 2016 10:12:13 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R1.fastq.gz >> > > Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >> > > passesPreconditions >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > WARNING: Failed to pass preconditions for ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R1.fastq.gz.met >> > > Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >> > > INFO: Passed precondition comparator id >> > > CheckThatDataFileSizeIsGreaterThanZero >> > > Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Generating met file for product file: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/E837642_R1.fastq.gz.met] >> > > Apr 02, 2016 10:12:14 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Executing command line: >> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R1.fastq.gz.met >> > > text ] with workingDir: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq] >> > > to extract metadata >> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not >> > > processed ! >> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > SEVERE: Failed to get metadata for product : Met extractor failed to >> > > create metadata file >> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met >> > > extractor failed to create metadata file >> > > at >> > > >> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat >> > a(ExternMetExtractor.java:120) >> > > at >> > > >> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst >> > ractMetExtractor.java:74) >> > > at >> > > >> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu >> > ct(AutoDetectProductCrawler.java:84) >> > > at >> > > >> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav >> > a:136) >> > > at >> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) >> > > at >> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) >> > > at >> > > >> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( >> > CrawlerLauncherCliAction.java:58) >> > > at >> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) >> > > at >> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) >> > > at >> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: >> > > 36 >> > > ) >> > > >> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R2.fastq.gz >> > > Apr 02, 2016 10:12:15 PM >> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >> > > passesPreconditions >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > WARNING: Failed to pass preconditions for ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R2.fastq.gz.met >> > > Apr 02, 2016 10:12:15 PM >> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >> > > INFO: Passed precondition comparator id >> > > CheckThatDataFileSizeIsGreaterThanZero >> > > Apr 02, 2016 10:12:16 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Generating met file for product file: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/E837642_R2.fastq.gz.met] >> > > Apr 02, 2016 10:12:16 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Executing command line: >> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/E837642_R2.fastq.gz.met >> > > text ] with workingDir: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq] >> > > to extract metadata >> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not >> > > processed ! >> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > SEVERE: Failed to get metadata for product : Met extractor failed to >> > > create metadata file >> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met >> > > extractor failed to create metadata file >> > > at >> > > >> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat >> > a(ExternMetExtractor.java:120) >> > > at >> > > >> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst >> > ractMetExtractor.java:74) >> > > at >> > > >> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu >> > ct(AutoDetectProductCrawler.java:84) >> > > at >> > > >> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav >> > a:136) >> > > at >> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) >> > > at >> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) >> > > at >> > > >> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( >> > CrawlerLauncherCliAction.java:58) >> > > at >> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) >> > > at >> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) >> > > at >> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: >> > > 36 >> > > ) >> > > >> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/cas-crawler-04-02-16.log.gz >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >> > > passesPreconditions >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > WARNING: Failed to pass preconditions for ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/cas-crawler-04-02-16.tar.gz >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >> > > passesPreconditions >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > WARNING: Failed to pass preconditions for ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S >> > > eq >> > > -RawData-fastq-04-02-16.tar.gz >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >> > > passesPreconditions >> > > WARNING: No extractor specs specified for >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S >> > > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > WARNING: Failed to pass preconditions for ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA- >> > > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Handling file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/test >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >> > > INFO: Passed precondition comparator id >> > > CheckThatDataFileSizeIsGreaterThanZero >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Generating met file for product file: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/test] >> > > Apr 02, 2016 10:12:17 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Executing command line: >> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/test >> > > text ] with workingDir: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq] >> > > to extract metadata >> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing >> > > NGS server at >> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8 >> > > 08 >> > > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6 >> > > yv >> > > Z1Cs- >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tS >> c >> > > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e= >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > metadata for file_host are not in array format.Converting.. >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > adding key/value [file_host]/[ip-192-168-8-66] >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > metadata for ProductType are not in array format.Converting.. >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > adding key/value [ProductType]/[GenericFile] >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > metadata for ingest_user are not in array format.Converting.. >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > adding key/value [ingest_user]/[kmavrommatis] >> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file >> > > path is ARRAY(0x22d3f48). It will be added under the FilePath >> > > metadata field >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > metadata for FilePath are not in array format.Converting.. >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >> > > adding key/value >> > > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se >> > > q/ >> > > RawData/fastq/test] >> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file >> > > is of type text >> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing >> > > metadata in file >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/test.met >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/test >> > > to >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >> > > st >> > > q/test >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >> > > kmavrommatis to >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >> > > kmavrommatis >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >> > > GenericFile to >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >> > > GenericFile >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >> > > ip-192-168-8-66 to >> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >> > > ip-192-168-8-66 >> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process >> > > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >> > > extrMetadata >> > > INFO: Met extraction successful for product file: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/test] Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler ingest >> > > INFO: ProductCrawler: Ready to ingest product: >> > > >> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA- >> Seq/RawData/fastq/test]: >> > > ProductType: [GenericFile] >> > > Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.filemgr.ingest.StdIngester >> > > setFileManager >> > > INFO: StdIngester: connected to file manager: >> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A >> > > 90 >> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 >> > > Cs >> > > - >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov >> pwZVR >> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer >> > > setFileManagerUrl >> > > INFO: In Place Data Transfer to: >> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A >> > > 90 >> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq- >> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 >> > > Cs >> > > - >> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov >> pwZVR >> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 >> > > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester >> > > ingest >> > > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType: >> > > [GenericFile]: FileLocation: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/] >> > > Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient >> > > ingestProduct >> > > FINEST: File Manager Client: clientTransfer enabled: transfering >> > > product [test] Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.filemgr.versioning.VersioningUtils >> > > createBasicDataStoreRefsFlat >> > > FINE: VersioningUtils: Generated data store ref: >> > > file:/opt/oodt/data/archive/test/test from origRef: >> > > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa >> > > ta /fastq/test Apr 02, 2016 10:12:19 PM >> > > org.apache.oodt.cas.crawl.ProductCrawler ingest >> > > INFO: Successfully ingested product: >> > > >> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA- >> Seq/RawData/fastq/test]: >> > > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc >> > > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler >> > > handleFile >> > > INFO: Successful ingest of product: >> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >> > > as >> > > tq/test] >> > > >> > > >> > > ********************************************************* >> > > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS >> CONFIDENTIAL AND >> > > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR >> THE USE >> > > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. >> > > If the reader is not the intended recipient, or the employee or >> > > agent responsible to deliver it to the intended recipient, you are >> > > hereby notified that any dissemination, distribution or copying of >> > > this communication is strictly prohibited. If you have received this >> > > communication in error, please reply to the sender to notify us of >> > > the error and delete the original message. Thank You. >> > > >> > >> > >> > -- >> > *Lewis* >> > >> > ********************************************************* >> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS >> CONFIDENTIAL AND >> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR >> THE USE >> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. >> > If the reader is not the intended recipient, or the employee or agent >> > responsible to deliver it to the intended recipient, you are hereby >> > notified that any dissemination, distribution or copying of this >> > communication is strictly prohibited. If you have received this >> > communication in error, please reply to the sender to notify us of the >> > error and delete the original message. Thank You. >> > >> >> >> >> -- >> *Lewis* >> >> ********************************************************* >> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL >> AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY >> FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. >> If the reader is not the intended recipient, or the employee or agent >> responsible to >> deliver it to the intended recipient, you are hereby notified that any >> dissemination, >> distribution or copying of this communication is strictly prohibited. If you >> have >> received this communication in error, please reply to the sender to notify >> us of the >> error and delete the original message. Thank You.