[
https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126413#comment-13126413
]
Gabriele Kahlout edited comment on NUTCH-1001 at 10/13/11 7:47 AM:
-------------------------------------------------------------------
Hi Lewis,
Regarding indentation I can fix that by re-doing the changes without
re-indenting. As a side note [in maven it's possible to enforce an indentation
style|http://maven.apache.org/plugins/maven-checkstyle-plugin/] (and have each
developer override it locally), is there something similar for ant? It's a pity
to be bogged down by such issues.
Regarding a patch for 2.0 I could try (never done before) to write one once we
agree on the solution.
Regarding -dir I see Markus point: consistency. My thought is very simple
though: never think about what the program could figure out on its own. In this
case this spares the developer from having to learn about -dir, remembering to
use it, and having errors when not used. In this case i'd argue that I'd rather
get rid of -dir elsewhere too.
I can see adding -dir as 'redundant' syntax. I.e. not using it still works, but
using it means means that the argument must be a directory or else fail.
was (Author: simpatico):
Hi Lewis,
Regarding indentation I can fix that by re-doing the changes without
re-indenting. As a side note in maven it's possible to enforce an indentation
style (and have each developer override it locally), is there something similar
for ant? It's a pity to be bogged down by such issues.
Regarding a patch for 2.0 I could try (never done before) to write one once we
agree on the solution.
Regarding -dir I see Markus point: consistency. My thought is very simple
though: never think about what the program could figure out on its own. In this
case this spares the developer from having to learn about -dir, remembering to
use it, and having errors when not used. In this case i'd argue that I'd rather
get rid of -dir elsewhere too.
I can see adding -dir as 'redundant' syntax. I.e. not using it still works, but
using it means means that the argument must be a directory or else fail.
> bin/nutch fetch/parse handle crawl/segments directory
> -----------------------------------------------------
>
> Key: NUTCH-1001
> URL: https://issues.apache.org/jira/browse/NUTCH-1001
> Project: Nutch
> Issue Type: Improvement
> Reporter: Gabriele Kahlout
> Priority: Minor
> Fix For: 1.4, nutchgora
>
> Attachments: NUTCH-1001.patch
>
>
> I'm having issues porting scripts across different systems to support the
> step of extracting the latest/only segments resulting from the generate phase.
> Variants include:
> $ export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1` #[1]
> $ s1=`ls -d crawl/segments/2* | tail -1` #[2]
> $ segment=`$HADOOP_HOME/bin/hadoop dfs -ls crawl/segments | tail -1 | grep -o
> [a-zA-Z0-9/\-]* |tail -1`
> $ segment=`$HADOOP_HOME/bin/hdfs -ls crawl/segments | tail -1 | grep -o
> [a-zA-Z0-9/\-]* |tail -1`
> And I'm not sure what windows users would have to do. Some users may also do
> with:
> bin/nutch fetch with crawl/segments/2*
> But I don't see a need in having the user extract/worry-about the latest/only
> segment, and have it a described step in every nutch tutorial. More over only
> fetch and parse expect a segment while other commands are fine with the
> directory of segments.
> Therefore, I think it's beneficial if fetch and parse also handle directories
> of segments.
> [1] http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
> [2] http://wiki.apache.org/nutch/NutchTutorial#Command_Line_Searching
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira