[
https://issues.apache.org/jira/browse/AVRO-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187351#comment-13187351
]
Scott Carey commented on AVRO-991:
----------------------------------
{quote}
For the record, the thinking behind the varied sync marker is that it makes
collisions less likely. In theory this is not true, but in practice my concern
was that, once a value was fixed and known, there'd be a significantly higher
probability that someone would include it in some data. Perhaps that's not
correct, though.{quote}
If the sync marker was known to have a few properties it would reduce the
collision rate with typical Avro data with the 'null codec'
* It could contain a sequence of bytes that can not be interpreted as UTF8.
(e.g. insufficient or too many continuation bytes)
* It could contain a sequence of bytes that can not be interpreted as an Avro
encoded int or long. (e.g. 10 consecutive bytes with the MSB set)
In order to achieve the above you lose some randomness, and we may have to
compensate with a couple extra bytes.
For each codec, there may be a byte sequences that is impossible in the encoded
data. Each codec could have its own sync marker. Files with incompatible
codecs could not be concatenated together anyway.
> Allow combining multiple Avro files within a stream. (no files on disk)
> -----------------------------------------------------------------------
>
> Key: AVRO-991
> URL: https://issues.apache.org/jira/browse/AVRO-991
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.6.1
> Reporter: Frank Grimes
>
> It would be nice to be able to do as follows:
> cat file1.avro file2.avro | java -jar avro-tools.jar streamcombine >
> combined-file.avro
> or similarly
>
> hadoop dfs -cat hdfs://hadoop/file1.avro hdfs://hadoop/file2.avro | java
> -jar avro-tools.jar streamcombine | hdfs -put -
> hdfs://hadoop/combined-file.avro
> See the following thread for details:
> http://mail-archives.apache.org/mod_mbox/avro-user/201201.mbox/%[email protected]%3e
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira