starting character sequence in delimiter were found missing in certain cases in the Map Output

Robert Joseph Evans (JIRA) Tue, 21 Aug 2012 08:16:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438792#comment-13438792
 ]


Robert Joseph Evans commented on HADOOP-8655:
---------------------------------------------

Gelesh,

The new patch looks better, but I still have a few comments.

 # Please make sure you follow the style guide. It should follow [Sun's code 
conventions|http://java.sun.com/docs/codeconv/] except indentation is 2 spaces, 
not 4.  There are still tabs everywhere throughout the code and there are many 
lines that go over 80 characters in length.  Comments are included in the 80 
character limit.
 # In the test getTestData method is only called once, and is very specific to 
the single test method.  I would prefer to see it inlined in 
testCustomDeliminator.
 # I appreciate that you want to explain what is happening in your code, but I 
don't think you need quite so many comments.  For example you don't need to 
reference HADOOP-8654.  There should be test cases added with HADOOP-8654 to 
validate that there were no regression. 
                
> In TextInputFormat, while specifying textinputformat.record.delimiter the 
> character/character sequences in data file similar to starting 
> character/starting character sequence in delimiter were found missing in 
> certain cases in the Map Output
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8655
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8655
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.2
>         Environment: Linux- Ubuntu 10.04
>            Reporter: Arun A K
>              Labels: hadoop, mapreduce, textinputformat, 
> textinputformat.record.delimiter
>         Attachments: HADOOP-8654.patch, HADOOP-8655.patch, HADOOP-8655.patch, 
> MAPREDUCE-4519.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Set textinputformat.record.delimiter as "</entity>"
> Suppose the input is a text file with the following content
> <entity><id>1</id><name>User1</name></entity><entity><id>2</id><name>User2</name></entity><entity><id>3</id><name>User3</name></entity><entity><id>4</id><name>User4</name></entity><entity><id>5</id><name>User5</name></entity>
> Mapper was expected to get value as 
> Value 1 - <entity><id>1</id><name>User1</name>
> Value 2 - <entity><id>2</id><name>User2</name>
> Value 3 - <entity><id>3</id><name>User3</name>
> Value 4 - <entity><id>4</id><name>User4</name>
> Value 5 - <entity><id>5</id><name>User5</name>
> According to this bug Mapper gets value
> Value 1 - entity><id>1</id><name>User1</name>
> Value 2 - <entity>id>2</id><name>User2</name>
> Value 3 - <entity><id>3id><name>User3</name>
> Value 4 - <entity><id>4</id><name>User4name>
> Value 5 - <entity><id>5</id><name>User5</name>
> The pattern shown above need not occur for value 1,2,3 necessarily. The bug 
> occurs at some random positions in the map input.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8655) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output

Reply via email to