We’ll run into file length issues - Giuseppe had the same problem,
and so did students who used it from USC hence the solution we have
now. I think having nested directory structures is probably the best
bet, and making it configurable.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "Michael Joyce (JIRA)" <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, November 12, 2015 at 11:17 AM
To: "[email protected]" <[email protected]>
Subject: [jira] [Commented] (NUTCH-2166) Add reverse URL format to dump
tool

>
>    [ 
>https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.p
>lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002328#com
>ment-15002328 ] 
>
>Michael Joyce commented on NUTCH-2166:
>--------------------------------------
>
>Small change in dump format. Instead of making a bajillion nested folders
>it seems like it might be nicer to simple use the reverse URL as the file
>name.
>
>So the file for 
>http://bar.foo.com:8983/to/index.htm
>Would dump to the encoded
><output folder>/com%2Ffoo%2Fbar%2F8983%2Fhttp%2Fto%2Findex.htm
>
>Of course, we may then run into file name length issues this way. Perhaps
>having both eventually will be useful?
>
>> Add reverse URL format to dump tool
>> -----------------------------------
>>
>>                 Key: NUTCH-2166
>>                 URL: https://issues.apache.org/jira/browse/NUTCH-2166
>>             Project: Nutch
>>          Issue Type: Improvement
>>          Components: tool
>>    Affects Versions: 2.3, 1.10
>>            Reporter: Michael Joyce
>>            Assignee: Michael Joyce
>>             Fix For: 2.4, 1.11
>>
>>
>> Update the FileDumper tool with an option for dumping files to the
>>output directory in reverse URL format.
>> So the file for 
>> http://bar.foo.com:8983/to/index.html?a=b
>> Would dump to
>> <output folder>/com/foo/bar/8983/http/to/index.html?a=b
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)

Reply via email to