thanks Val — Chris Mattmann [email protected]
On 4/6/16, 7:15 PM, "Mallder, Valerie" <[email protected]> wrote: >I haven't had a chance to study this yet. But after a first pass through this >email trail I'm suspicious that Kostas may be running into the same problem I >ran into when tika was either introduced or upgraded to a much newer version >than had been in the system previously. I ended up having to modify my >mimetypes.xml file to get around the problem I was having after that happened. >But, I will look at this in detail tomorrow and compare it to my history of >debugging when I was going from versions 0.6 to 0.7 to 0.8 to 0.9 and 0.10 and >see if the problem is what I have seen before. However, I am staying at 0.10, >so I won't be able to speak for going up to version 0.12. > >Val > > > >Sent with Good (www.good.com) >________________________________ >From: Chris Mattmann <[email protected]> >Sent: Wednesday, April 6, 2016 9:58:15 PM >To: [email protected] >Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor >specifications > >Thanks Kostas, they are wire compatible and this is a good >use case. > >The crawler should not have undergone much update (perhaps at >all) since 0.6, so am not exactly sure why you were seeing >issues with it. There are definitely upgrades since 0.6 to CAS-PGE >and maybe that’s what you were running into. > > >— >Chris Mattmann >[email protected] > > > > > > > >On 4/6/16, 6:47 PM, "Konstantinos Mavrommatis" <[email protected]> >wrote: > >>I am giving up on this.... >>I had used [1] in the first place to setup oodt (v0.6 back then) my setup in >>the new system is identical to the old one. >>I could not make much out of [0]. Among other things I tried to copy the >>files in the old crawler/policy to the new crawler/policy - which included >>some legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried >>to reinstall the full oodt on the client side, but still did not work. >> >>I ended up reverting to the older version (0.6) which I run on my client. The >>server (which runs FM) is still 0.12, but the combination seems to be working >>fine. >> >>K >> >>-----Original Message----- >>From: Lewis John Mcgibbney [mailto:[email protected]] >>Sent: Tuesday, April 05, 2016 3:33 AM >>To: [email protected] >>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor >>specifications >> >>Hi K, >>OK so I did a bit of searching here and located a bunch of files which are >>defined as legacy... you can check the search results out below >>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e= >>I would urge you to have a look at the AutoDetectProductCrawler Javadoc >>description included in master branch [0] as well to see if you've got >>everything required. >>Finally, I came across some documentation on the wiki which may guide you in >>the right direction [1]. It may also be outdated though so please let us know >>if that it the case. >>hth >> >>[0] >>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e= >>[1] >>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e= >> >>On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < >>[email protected]> wrote: >> >>> Hi, >>> It seems to be happening for a number of types of files that I have in >>> the mimetypes.xml. >>> A few things are puzzling to me: this file which is a .gz file is not >>> processed by the regular tika mimetypes which contains the gzip files >>> A file that has no extension, which defaults to txt is passed to the >>> MetExtractor.pl and processed. >>> >>> Any ideas I can find what are the preconditions that fail ? I tried to >>> change the log level to DEBUG for all components but I did not get >>> much more information. This must be something that changed in the OODT >>> releases >>> >0.6 but could not find anything relevant in the release notes. >>> I also noticed in the documentation of the AutoDecectProductCrawler >>> that it uses the file met-extr-preconditions.xml which I could not >>> find anywhere in the deployed OODT or the src directories. Could that >>> be a reason for the problem I observe? >>> >>> Thanks >>> K >>> >>> -----Original Message----- >>> From: Lewis John Mcgibbney [mailto:[email protected]] >>> Sent: Monday, April 04, 2016 3:24 PM >>> To: [email protected] >>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor >>> specifications >>> >>> Hi Konstantinos, >>> It appears to be happening with a tar.gz file as well right? >>> >>> WARNING: No extractor specs specified for >>> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast >>> q/cas-crawler-04-02-16.log.gz >>> >>> I wonder if it is the file names... However I would be extremely >>> surprised as I've seen some much more verbose file naming. >>> Lewis >>> >>> On Saturday, April 2, 2016, Konstantinos Mavrommatis < >>> [email protected]> wrote: >>> >>> > Hi, >>> > I am trying to replicate a fully functional service that I had setup >>> > long time ago using OODT 0.6 but I am having the following problem >>> > that does not allow me to ingest files. When I try to ingest files >>> > with the extension fastq.gz I get the line: >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > And of course the file is not ingested. This process works without >>> > problem with OODT 0.6 on a different server. >>> > >>> > The crawler command I am running is: >>> > ./crawler_launcher \ >>> > --operation \ >>> > --launchAutoCrawler \ >>> > --productPath $FILEPATH \ >>> > --filemgrUrl $OODT_FILEMGR_URL \ >>> > --clientTransferer >>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >>> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ >>> > --crawlForDirs 2>&1 >>> > >>> > >>> > >>> > I have setup OODT 0.12 on a server which runs FM listening to port 9000. >>> > From a client machine I have verified that I can use FM to ingest >>> products. >>> > I am now trying to use crawler to crawl and ingest all files in a >>> > directory. Since I have non standard MIME types in these directories >>> > I have done the following: >>> > 1. Added my own mime types in policy/mimetypes.xml eg >>> > <mime-type type="text/fastq"> >>> > <glob pattern="*.fastq"/> >>> > <glob pattern="*.fastq.gz"/> >>> > <glob pattern="*.fastq.bz"/> >>> > <glob pattern="*.fastq.bz2"/> >>> > <glob pattern="*.fastq.bzip"/> >>> > <glob pattern="*.fq"/> >>> > <glob pattern="*.fq.gz"/> >>> > <glob pattern="*.fq.bz"/> >>> > <glob pattern="*.fq.bz2"/> >>> > <glob pattern="*.fq.bzip"/> >>> > </mime-type> >>> > 2. created the file policy/mime-extractor-map.xml >>> > >>> > <mime type="text/fastq"> >>> > <extractor >>> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor"> >>> > <config >>> > file="/apache-oodt/crawler/bin/fastq.config"/> >>> > <preCondComparators> >>> > <preCondComparator >>> > id="CheckThatDataFileSizeIsGreaterThanZero"/> >>> > </preCondComparators> >>> > </extractor> >>> > </mime> >>> > >>> > 3. created the file fastq.config >>> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor >>> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e= >>> > "> >>> > <exec workingDir=""> >>> > >>> > >>> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract >>> orBinPath> >>> > <args> >>> > <arg isDataFile="true"></arg> >>> > <arg>fastq</arg> >>> > </args> >>> > </exec> >>> > </cas:externextractor> >>> > >>> > >>> > >>> > The MetExtractorNGS.pl is a small perl script that opens the file to >>> > be ingested, gets some information and stores it in the .met file >>> > that corresponds to the file to be ingested and have manually >>> > verified that works as expected producing the correct met file. >>> > >>> > What am I missing here? Any ideas comments suggestions will be >>> > greatly appreciated. >>> > Thanks in advance for any help >>> > Kostas >>> > >>> > >>> > >>> > PS1 The full output from running the crawler command follows: >>> > >>> > >>> > Setting property 'StdProductCrawler.filemgrUrl' >>> > Setting property 'MetExtractorProductCrawler.filemgrUrl' >>> > Setting property 'AutoDetectProductCrawler.filemgrUrl' >>> > Setting property 'StdProductCrawler.clientTransferer' >>> > Setting property 'MetExtractorProductCrawler.clientTransferer' >>> > Setting property 'AutoDetectProductCrawler.clientTransferer' >>> > Setting property 'StdProductCrawler.noRecur' >>> > Setting property 'MetExtractorProductCrawler.noRecur' >>> > Setting property 'AutoDetectProductCrawler.noRecur' >>> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo' >>> > Setting property 'StdProductCrawler.productPath' >>> > Setting property 'MetExtractorProductCrawler.productPath' >>> > Setting property 'AutoDetectProductCrawler.productPath' >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value >>> > [true] Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'StdProductCrawler.productPath' set to value >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value >>> > [true] Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to >>> > value [../policy/mime-extractor-map.xml] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to >>> > value >>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >>> > ] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [ >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >>> > 00 >>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >>> > s- >>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1 >>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to >>> > value >>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >>> > ] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr >>> > 02, >>> > 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [ >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >>> > 00 >>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >>> > s- >>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1 >>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value >>> > [ >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9 >>> > 00 >>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C >>> > s- >>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1 >>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'StdProductCrawler.clientTransferer' set to value >>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >>> > ] >>> > Apr 02, 2016 10:12:13 PM >>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer >>> > processKey >>> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as tq] Apr 02, 2016 10:12:13 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > crawl >>> > INFO: Crawling >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q Apr 02, 2016 10:12:13 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R1.fastq.gz >>> > Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >>> > passesPreconditions >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > WARNING: Failed to pass preconditions for ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R1.fastq.gz.met >>> > Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >>> > INFO: Passed precondition comparator id >>> > CheckThatDataFileSizeIsGreaterThanZero >>> > Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Generating met file for product file: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/E837642_R1.fastq.gz.met] >>> > Apr 02, 2016 10:12:14 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Executing command line: >>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R1.fastq.gz.met >>> > text ] with workingDir: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq] >>> > to extract metadata >>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not >>> > processed ! >>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > SEVERE: Failed to get metadata for product : Met extractor failed to >>> > create metadata file >>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met >>> > extractor failed to create metadata file >>> > at >>> > >>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat >>> a(ExternMetExtractor.java:120) >>> > at >>> > >>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst >>> ractMetExtractor.java:74) >>> > at >>> > >>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu >>> ct(AutoDetectProductCrawler.java:84) >>> > at >>> > >>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav >>> a:136) >>> > at >>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) >>> > at >>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) >>> > at >>> > >>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( >>> CrawlerLauncherCliAction.java:58) >>> > at >>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) >>> > at >>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) >>> > at >>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: >>> > 36 >>> > ) >>> > >>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R2.fastq.gz >>> > Apr 02, 2016 10:12:15 PM >>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >>> > passesPreconditions >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > WARNING: Failed to pass preconditions for ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R2.fastq.gz.met >>> > Apr 02, 2016 10:12:15 PM >>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >>> > INFO: Passed precondition comparator id >>> > CheckThatDataFileSizeIsGreaterThanZero >>> > Apr 02, 2016 10:12:16 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Generating met file for product file: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/E837642_R2.fastq.gz.met] >>> > Apr 02, 2016 10:12:16 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Executing command line: >>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/E837642_R2.fastq.gz.met >>> > text ] with workingDir: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq] >>> > to extract metadata >>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not >>> > processed ! >>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > SEVERE: Failed to get metadata for product : Met extractor failed to >>> > create metadata file >>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met >>> > extractor failed to create metadata file >>> > at >>> > >>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat >>> a(ExternMetExtractor.java:120) >>> > at >>> > >>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst >>> ractMetExtractor.java:74) >>> > at >>> > >>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu >>> ct(AutoDetectProductCrawler.java:84) >>> > at >>> > >>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav >>> a:136) >>> > at >>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104) >>> > at >>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74) >>> > at >>> > >>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute( >>> CrawlerLauncherCliAction.java:58) >>> > at >>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) >>> > at >>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188) >>> > at >>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java: >>> > 36 >>> > ) >>> > >>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/cas-crawler-04-02-16.log.gz >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >>> > passesPreconditions >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > WARNING: Failed to pass preconditions for ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/cas-crawler-04-02-16.tar.gz >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >>> > passesPreconditions >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > WARNING: Failed to pass preconditions for ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S >>> > eq >>> > -RawData-fastq-04-02-16.tar.gz >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler >>> > passesPreconditions >>> > WARNING: No extractor specs specified for >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S >>> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > WARNING: Failed to pass preconditions for ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA- >>> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Handling file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/test >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval >>> > INFO: Passed precondition comparator id >>> > CheckThatDataFileSizeIsGreaterThanZero >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Generating met file for product file: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/test] >>> > Apr 02, 2016 10:12:17 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Executing command line: >>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/test >>> > text ] with workingDir: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq] >>> > to extract metadata >>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing >>> > NGS server at >>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8 >>> > 08 >>> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6 >>> > yv >>> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc >>> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e= >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > metadata for file_host are not in array format.Converting.. >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > adding key/value [file_host]/[ip-192-168-8-66] >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > metadata for ProductType are not in array format.Converting.. >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > adding key/value [ProductType]/[GenericFile] >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > metadata for ingest_user are not in array format.Converting.. >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > adding key/value [ingest_user]/[kmavrommatis] >>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file >>> > path is ARRAY(0x22d3f48). It will be added under the FilePath >>> > metadata field >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > metadata for FilePath are not in array format.Converting.. >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: >>> > adding key/value >>> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se >>> > q/ >>> > RawData/fastq/test] >>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file >>> > is of type text >>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing >>> > metadata in file >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/test.met >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/test >>> > to >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa >>> > st >>> > q/test >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >>> > kmavrommatis to >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >>> > kmavrommatis >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >>> > GenericFile to >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >>> > GenericFile >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing >>> > ip-192-168-8-66 to >>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - >>> > ip-192-168-8-66 >>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process >>> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor >>> > extrMetadata >>> > INFO: Met extraction successful for product file: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/test] Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler ingest >>> > INFO: ProductCrawler: Ready to ingest product: >>> > >>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]: >>> > ProductType: [GenericFile] >>> > Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.filemgr.ingest.StdIngester >>> > setFileManager >>> > INFO: StdIngester: connected to file manager: >>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A >>> > 90 >>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 >>> > Cs >>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR >>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer >>> > setFileManagerUrl >>> > INFO: In Place Data Transfer to: >>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A >>> > 90 >>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1 >>> > Cs >>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR >>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 >>> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester >>> > ingest >>> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType: >>> > [GenericFile]: FileLocation: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/] >>> > Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient >>> > ingestProduct >>> > FINEST: File Manager Client: clientTransfer enabled: transfering >>> > product [test] Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils >>> > createBasicDataStoreRefsFlat >>> > FINE: VersioningUtils: Generated data store ref: >>> > file:/opt/oodt/data/archive/test/test from origRef: >>> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa >>> > ta /fastq/test Apr 02, 2016 10:12:19 PM >>> > org.apache.oodt.cas.crawl.ProductCrawler ingest >>> > INFO: Successfully ingested product: >>> > >>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]: >>> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc >>> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler >>> > handleFile >>> > INFO: Successful ingest of product: >>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f >>> > as >>> > tq/test] >>> > >>> > >>> > ********************************************************* >>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND >>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE >>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. >>> > If the reader is not the intended recipient, or the employee or >>> > agent responsible to deliver it to the intended recipient, you are >>> > hereby notified that any dissemination, distribution or copying of >>> > this communication is strictly prohibited. If you have received this >>> > communication in error, please reply to the sender to notify us of >>> > the error and delete the original message. Thank You. >>> > >>> >>> >>> -- >>> *Lewis* >>> >>> ********************************************************* >>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND >>> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE >>> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. >>> If the reader is not the intended recipient, or the employee or agent >>> responsible to deliver it to the intended recipient, you are hereby >>> notified that any dissemination, distribution or copying of this >>> communication is strictly prohibited. If you have received this >>> communication in error, please reply to the sender to notify us of the >>> error and delete the original message. Thank You. >>> >> >> >> >>-- >>*Lewis* >>********************************************************* >>THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS >>CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED >>INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL >>OR INDIVIDUALS NAMED ABOVE. >>If the reader is not the intended recipient, or the >>employee or agent responsible to deliver it to the >>intended recipient, you are hereby notified that any >>dissemination, distribution or copying of this >>communication is strictly prohibited. If you have >>received this communication in error, please reply to the >>sender to notify us of the error and delete the original >>message. Thank You. >
