crawler returns error if we use capital letters in the name of the mimetypes or 
if we don't use "product/" at the beginning of the name.
----------------------------------------------------------------------------------------------------------------------------------------

                 Key: OODT-148
                 URL: https://issues.apache.org/jira/browse/OODT-148
             Project: OODT
          Issue Type: Bug
          Components: crawler
    Affects Versions: 0.3
         Environment: unix
            Reporter: faranak davoodi
             Fix For: 0.2


naming the mimetypes in the crawler required some certain format that is not 
documented anywhere. Suppose it should be started with "product/" and it should 
be all in lower case of it returns error. And the error is so general that you 
don't know what the problem is.

I used "product/dadsL0" as the name for a mimetype. and I got the errors below:

Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file 
/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0
Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
passesPreconditions
WARNING: No extractor specs specified for 
/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0
Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: 
[/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]

After changing the mimetype name to "product/dadsl0" I got:

Feb 24, 2011 9:25:46 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: Successfully ingested product: 
[/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]: 
product id: b2c6deec-409f-11e0-9885-3f3332df0e68
Feb 24, 2011 9:25:46 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Successful ingest of product: 
[/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]

I wish the format for the mimetype names wouldn't be this sensitive. And if it 
is necessary to have such a format, then we might want to have it documented in 
the crawler's user guide to avoid hours of confusion.

Thanks.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to