[
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002328#comment-15002328
]
Michael Joyce commented on NUTCH-2166:
--------------------------------------
Small change in dump format. Instead of making a bajillion nested folders it
seems like it might be nicer to simple use the reverse URL as the file name.
So the file for
http://bar.foo.com:8983/to/index.htm
Would dump to the encoded
<output folder>/com%2Ffoo%2Fbar%2F8983%2Fhttp%2Fto%2Findex.htm
Of course, we may then run into file name length issues this way. Perhaps
having both eventually will be useful?
> Add reverse URL format to dump tool
> -----------------------------------
>
> Key: NUTCH-2166
> URL: https://issues.apache.org/jira/browse/NUTCH-2166
> Project: Nutch
> Issue Type: Improvement
> Components: tool
> Affects Versions: 2.3, 1.10
> Reporter: Michael Joyce
> Assignee: Michael Joyce
> Fix For: 2.4, 1.11
>
>
> Update the FileDumper tool with an option for dumping files to the output
> directory in reverse URL format.
> So the file for
> http://bar.foo.com:8983/to/index.html?a=b
> Would dump to
> <output folder>/com/foo/bar/8983/http/to/index.html?a=b
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)