[ 
https://issues.apache.org/jira/browse/PHOENIX-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812736#comment-15812736
 ] 

Gabriel Reid commented on PHOENIX-3538:
---------------------------------------

Sorry for such a slow reply on this [~kalyanhadoop].

There seems to be an issue with your patch file. When I try to import it, I get 
the following error:
{code}
$ patch -p1 < PHOENIX-3538-final.patch  

patching file 
phoenix-core/src/it/java/org/apache/phoenix/end2end/RegexBulkLoadToolIT.java
patching file 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/RegexBulkLoadTool.java
patching file 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/RegexToKeyValueMapper.java
patching file 
phoenix-core/src/main/java/org/apache/phoenix/util/regex/ObjectToArrayConverter.java
patching file 
phoenix-core/src/main/java/org/apache/phoenix/util/regex/RegexUpsertExecutor.java
patch unexpectedly ends in middle of line
patch: **** malformed patch at line 835:  
{code}

In any case, I took a look through the patch file, and it looks quite good. 
Just a few minor comments on things I noticed though:
* it looks like there is quite a bit of copy/paste of the existing JSON and/or 
CSV code in the RegexUpsertExecutor and ObjectToArrayConverter. There are also 
some references to JSON within these regex classes. At a minimum, the 
references to JSON in error/exeption messages should be cleaned up, and even 
better would be if the code duplication there was improved so that we've only 
got one copy of the same logic.
* While it's good that there is javadoc in RegexBulkLoadTool and 
RegexToKeyValueMapper classes, it looks to me like it was pretty much copied 
from the JSON variant of those classes. I think that there should be some 
clearer explanation of what the classes do, i.e. how the regex-extracted values 
need to match up with the columns being upserted, and how non-matching lines 
are handled. Right now the javadoc says that it takes "REGEX input lines", 
which is pretty confusing and not really what is being done by the class.


> Regex Bulkload Tool
> -------------------
>
>                 Key: PHOENIX-3538
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3538
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Kalyan
>            Assignee: Kalyan
>            Priority: Minor
>         Attachments: PHOENIX-3538-final.patch, PHOENIX-3538-v1.patch, 
> PHOENIX-3538.patch
>
>
> To work with complex data , we can regex to load directly.
> Similar to JSON Bulkload Tool & CSV Bulkload Tool



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to