[
https://issues.apache.org/jira/browse/GORA-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187434#comment-14187434
]
Renato Javier MarroquĂn Mogrovejo commented on GORA-392:
--------------------------------------------------------
Hi Sergey,
This configuration could be set per job, per cluster, or even per query (I am
not happy about having the ability to set it per query but . . . ). This could
be set doing something like this:
Configuration conf = new Configuration();
conf.set("io.serializations",
"org.apache.hadoop.io.serializer.JavaSerialization," +
"org.apache.hadoop.io.serializer.WritableSerialization");
And that is. There is also an example in
https://github.com/apache/gora/blob/master/gora-core/src/test/conf/core-site.xml
> Move PersistentSerialization to the top of serializations list
> --------------------------------------------------------------
>
> Key: GORA-392
> URL: https://issues.apache.org/jira/browse/GORA-392
> Project: Apache Gora
> Issue Type: Improvement
> Components: gora-core
> Affects Versions: 0.5
> Reporter: Sergey Weiss
>
> In a process of making Nutch2 run on Hadoop 2.3.0 + HBase 0.98.1 we
> encountered java.io.EOFException's like ones described in this mail thread:
> http://www.mail-archive.com/user%40nutch.apache.org/msg12644.html
> We applied a patch mentioned there and got our setup running but being very
> unstable: it would fail with an ArrayIndexOutOfBounds exception whenever we
> try to generate a batch of some 50 or more pages to fetch.
> We investigated the problem and discovered that in working setup of Nutch2 +
> Hadoop 1.2.0 + HBase 0.94.14, PersistentDeserializer is used for
> deserialization during reduce phase, and not
> AvroSerialization.AvroDeserializer. The reason for this sudden swap of
> deserializers lies in GoraMapReduceUtils#setIOSerializations method. It uses
> StringUtils.joinStringArrays and this method uses HashSet under the hood. Two
> more serializations were added to io.serializations property in Hadoop 2.3.0
> compared to Hadoop 1.2.0 and this results in AvroSpecificSerialization being
> placed on top of serializations list.
> After we have patched GoraMapReduceUtils#setIOSerializations, having
> explicitly set PersistentSerialization to be the top of the list, we have
> fixed the problem with instability. Moreover, we don't even need to patch
> Avro now, just one simple change in Gora and everything works like a charm!
> So we propose to move PersistentSerialization to the top of serializations
> list.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)