We’ll run into file length issues - Giuseppe had the same problem, and so did students who used it from USC hence the solution we have now. I think having nested directory structures is probably the best bet, and making it configurable.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "Michael Joyce (JIRA)" <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, November 12, 2015 at 11:17 AM To: "[email protected]" <[email protected]> Subject: [jira] [Commented] (NUTCH-2166) Add reverse URL format to dump tool > > [ >https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.p >lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002328#com >ment-15002328 ] > >Michael Joyce commented on NUTCH-2166: >-------------------------------------- > >Small change in dump format. Instead of making a bajillion nested folders >it seems like it might be nicer to simple use the reverse URL as the file >name. > >So the file for >http://bar.foo.com:8983/to/index.htm >Would dump to the encoded ><output folder>/com%2Ffoo%2Fbar%2F8983%2Fhttp%2Fto%2Findex.htm > >Of course, we may then run into file name length issues this way. Perhaps >having both eventually will be useful? > >> Add reverse URL format to dump tool >> ----------------------------------- >> >> Key: NUTCH-2166 >> URL: https://issues.apache.org/jira/browse/NUTCH-2166 >> Project: Nutch >> Issue Type: Improvement >> Components: tool >> Affects Versions: 2.3, 1.10 >> Reporter: Michael Joyce >> Assignee: Michael Joyce >> Fix For: 2.4, 1.11 >> >> >> Update the FileDumper tool with an option for dumping files to the >>output directory in reverse URL format. >> So the file for >> http://bar.foo.com:8983/to/index.html?a=b >> Would dump to >> <output folder>/com/foo/bar/8983/http/to/index.html?a=b > > > >-- >This message was sent by Atlassian JIRA >(v6.3.4#6332)

