[ 
https://issues.apache.org/jira/browse/GORA-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Weiss updated GORA-392:
------------------------------
    Comment: was deleted

(was: --- 
gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapReduceUtils.java
+++ gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapReduceUtils.java
@@ -57,14 +57,14 @@ public class GoraMapReduceUtils {
    * @param reuseObjects boolean parameter to reuse objects
    */
   public static void setIOSerializations(Configuration conf, boolean 
reuseObjects) {
-    String serializationClass =
-      PersistentSerialization.class.getCanonicalName();
     String[] serializations = StringUtils.joinStringArrays(
-        conf.getStrings("io.serializations"), 
+        conf.getStrings("io.serializations"),
         "org.apache.hadoop.io.serializer.WritableSerialization",
-        StringSerialization.class.getCanonicalName(),
-        serializationClass); 
-    conf.setStrings("io.serializations", serializations);
+        StringSerialization.class.getCanonicalName());
+    String[] extendedSerializations = new String[serializations.length + 1];
+    extendedSerializations[0] = 
PersistentSerialization.class.getCanonicalName();
+    System.arraycopy(serializations, 0, extendedSerializations, 1, 
serializations.length);
+    conf.setStrings("io.serializations", extendedSerializations);
   }  
   
   public static List<InputSplit> getSplits(Configuration conf, String 
inputPath) 
)

> Move PersistentSerialization to the top of serializations list
> --------------------------------------------------------------
>
>                 Key: GORA-392
>                 URL: https://issues.apache.org/jira/browse/GORA-392
>             Project: Apache Gora
>          Issue Type: Improvement
>          Components: gora-core
>    Affects Versions: 0.5
>            Reporter: Sergey Weiss
>
> In a process of making Nutch2 run on Hadoop 2.3.0 + HBase 0.98.1 we 
> encountered java.io.EOFException's like ones described in this mail thread: 
> http://www.mail-archive.com/user%40nutch.apache.org/msg12644.html
> We applied a patch mentioned there and got our setup running but being very 
> unstable: it would fail with an ArrayIndexOutOfBounds exception whenever we 
> try to generate a batch of some 50 or more pages to fetch.
> We investigated the problem and discovered that in working setup of Nutch2 + 
> Hadoop 1.2.0 + HBase 0.94.14, PersistentDeserializer is used for 
> deserialization during reduce phase, and not 
> AvroSerialization.AvroDeserializer. The reason for this sudden swap of 
> deserializers lies in GoraMapReduceUtils#setIOSerializations method. It uses 
> StringUtils.joinStringArrays and this method uses HashSet under the hood. Two 
> more serializations were added to io.serializations property in Hadoop 2.3.0 
> compared to Hadoop 1.2.0 and this results in AvroSpecificSerialization being 
> placed on top of serializations list.
> After we have patched GoraMapReduceUtils#setIOSerializations, having 
> explicitly set PersistentSerialization to be the top of the list, we have 
> fixed the problem with instability. Moreover, we don't even need to patch 
> Avro now, just one simple change in Gora and everything works like a charm!
> So we propose to move PersistentSerialization to the top of serializations 
> list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to