[ 
https://issues.apache.org/jira/browse/NUTCH-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142510#comment-14142510
 ] 

Chris A. Mattmann edited comment on NUTCH-1844 at 9/21/14 4:52 PM:
-------------------------------------------------------------------

After examining the Nutch 1.2 CrawlDbConverter:
http://nutch.apache.org/apidocs/apidocs-1.2/org/apache/nutch/tools/compat/CrawlDbConverter.html

And running it:

{noformat}
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb foo -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls
CHANGES.txt         LICENSE.txt         README.txt          build/              
conf/               default.properties  hadoop.log          lib/                
src/
KEYS                NOTICE.txt          bin/                build.xml           
contrib/            docs/               index.html          site/
[chipotle:~/tmp/nutch1.2] mattmann% ls foo
ls: foo: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb 
../nutch/src/testresources/testcrawl/crawldb2 -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/test
ls: ../nutch/src/testresources/test: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb 
../nutch/src/testresources/testcrawl/crawldb2 
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
{noformat}

Both against:
* crawldb
* whole crawl dir
* segments

etc., it produces no output and I can't seem to figure out how to use it. So, 
rather than invest more time here, I am going to suggest that if in 48 hours, I 
don't hear objections, I'm going to delete the testresources/testcrawl since 
it's not referenced anywhere in the code.


was (Author: chrismattmann):
After examining the Nutch 1.2 CrawlDbConverter:
http://nutch.apache.org/apidocs/apidocs-1.2/org/apache/nutch/tools/compat/CrawlDbConverter.html

And running it:

{noformat}
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb foo -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls
CHANGES.txt         LICENSE.txt         README.txt          build/              
conf/               default.properties  hadoop.log          lib/                
src/
KEYS                NOTICE.txt          bin/                build.xml           
contrib/            docs/               index.html          site/
[chipotle:~/tmp/nutch1.2] mattmann% ls foo
ls: foo: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb 
../nutch/src/testresources/testcrawl/crawldb2 -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/  testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/test
ls: ../nutch/src/testresources/test: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib 
org.apache.nutch.tools.compat.CrawlDbConverter 
../nutch/src/testresources/testcrawl/crawldb 
../nutch/src/testresources/testcrawl/crawldb2 
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/  index/    indexes/  linkdb/   segments/
{noformat}

Both against:
*crawldb
*whole crawl dir
* segments

etc., it produces no output and I can't seem to figure out how to use it. So, 
rather than invest more time here, I am going to suggest that if in 48 hours, I 
don't hear objections, I'm going to delete the testresources/testcrawl since 
it's not referenced anywhere in the code.

> testresources/testcrawl not referenced anywhere in code.
> --------------------------------------------------------
>
>                 Key: NUTCH-1844
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1844
>             Project: Nutch
>          Issue Type: Bug
>          Components: test
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.10
>
>
> While working on NUTCH-1526 in Review Board 
> https://reviews.apache.org/r/9119/ [~lewismc] tried to test out the 
> ./bin/nutch dump tool on src/testresources/testcrawl and found that it failed 
> due to an old o.a.h.io.UTF8 key type (instead of the o.a.h.io.Text) type. 
> I looked into this - how were Nutch tests passing using this old code? I 
> found that Andrzej a long time ago wrote a tool to update the index from the 
> old UFT8 key format to Text - I also found that *no where in the Nutch code* 
> is the testcrawl referenced.
> My suggestion: 
> * we remove the testcrawl (it's not used)
> * if we don't remove it, we at least run Andrzej's tool on it and then 
> upgrade it to use o.a.h.io.Text keys. 
> I'll take care of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to