[
https://issues.apache.org/jira/browse/HADOOP-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131725#comment-13131725
]
Owen O'Malley commented on HADOOP-7760:
---------------------------------------
You've fallen into a classic mistake with BytesWritable. Note the comment on
the API:
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/BytesWritable.html#getBytes()]
You needed to write:
{code}
new ByteArrayInputStream(value.getBytes(), 0, value.getLength())
{code}
> BytesWritable / SequenceFile yields dummy linefeed at end as soon as content
> has one or more linefeeds.
> -------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-7760
> URL: https://issues.apache.org/jira/browse/HADOOP-7760
> Project: Hadoop Common
> Issue Type: Bug
> Components: record
> Affects Versions: 0.20.2
> Environment: Easily reproducable on Debian Linux cluster but also on
> my Arch Linux desktop.
> I am aware there are some newer releases in the 0.20 series, but all
> changelogs and release note links for those @
> http://hadoop.apache.org/common/releases.html are broken, so I can't check if
> this has been fixed and/or whether it's safe to upgrade.
> Reporter: Dieter Plaetinck
> Priority: Minor
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> I create SequenceFiles which have BytesWritable as values.
> I notice that if I store content which contains no linefeeds ("\n") or one
> linefeed, in the value, the value can also be read out of the sequencefile
> properly.
> However, as soon as I store input which contains two or more linefeeds (which
> is actually pretty much always the case), during the process of writing to
> the sequencefile and reading my data back, one *extra* linefeed is yielded at
> the end of the value, a linefeed which did not exist in the input.
> So this effectively corrupts my data, although i could write a hacky
> workaround for it.
> I have written a program that demonstrates the behavior, by showing what
> happens when writing 2 sequencefiles:
> one that has a record which value contains one linefeeds.
> another that has a record which value contains two linefeeds.
> Upon reading, the latter value will contain 3 linefeeds.
> Test file is : http://pastie.org/2728797
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira