[ 
https://issues.apache.org/jira/browse/NUTCH-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275657#comment-16275657
 ] 

ASF GitHub Bot commented on NUTCH-2441:
---------------------------------------

lewismc commented on a change in pull request #250: fix for NUTCH-2441 
ARG_SEGMENT fix for REST API
URL: https://github.com/apache/nutch/pull/250#discussion_r154500891
 
 

 ##########
 File path: src/java/org/apache/nutch/crawl/CrawlDb.java
 ##########
 @@ -318,16 +318,19 @@ public int run(String[] args) throws Exception {
           HadoopFSUtil.getPassDirectoriesFilter(fs));
       dirs.addAll(Arrays.asList(HadoopFSUtil.getPaths(paths)));
     }
-
-    else if(args.containsKey(Nutch.ARG_SEGMENT)) {
-      Object segments = args.get(Nutch.ARG_SEGMENT);
-      ArrayList<String> segmentList = new ArrayList<>();
-      if(segments instanceof ArrayList) {
-        segmentList = (ArrayList<String>)segments;
-      }
-      for(String segment: segmentList) {
-        dirs.add(new Path(segment));
-      }
+    else if(args.containsKey(Nutch.ARG_SEGMENTS)) {
+       Object segments = args.get(Nutch.ARG_SEGMENTS);
+       ArrayList<String> segmentList = new ArrayList<String>();      
+         if(segments instanceof ArrayList) {
+                 segmentList = (ArrayList<String>)segments;
+           }
+         else if(segments instanceof Path){
+                 segmentList.add(segments.toString());
 
 Review comment:
   Formatting should be 2 space throughout this patch, please amend.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> ARG_SEGMENT usage
> -----------------
>
>                 Key: NUTCH-2441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2441
>             Project: Nutch
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.13
>            Reporter: Semyon Semyonov
>             Fix For: 1.14
>
>         Attachments: metadataARG_SEGMENT.patch
>
>
> The class metadata/Nutch.java  public static final String ARG_SEGMENT = 
> "segment" is not used correctly. In some cases Fetcher and ParseSegment it is 
> interpreted as a single segmenet, in others CrawlDb, LinkDb, IndexingJob as 
> an array of segments. Such misunderstanding leads to inconsistency of usage 
> of the parameter.
> After a discussion with [~wastl-nagel]  the proposed solution is to allow the 
> usage of both array and a string in all cases. That gives an opportunity to 
> not introduce the broken changes.
> A path is proposed.
>  *The question left is refactoring, all these five components share the same 
> code(two versions of the same code to be precise). Shouldn't we extract a 
> method and reduce duplicates?  *



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to