[
https://issues.apache.org/jira/browse/PIO-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yarish George updated PIO-143:
------------------------------
Description:
In my hbase I have 2 events :
{code:javascript}
{"event" : "$set", "entityType" : "user", "entityId" : "100", "properties" :
{"test" : "0.8"}, "eventTime": "2017-12-19T21:02:49.228Z"}
{"event" : "$unset", "entityType" : "user", "entityId" : "100", "properties" :
{"test" : null}, "eventTime": "2017-12-19T21:02:55.228Z"}
{code}
which I'm trying to compress, actual result is:
{code:java}
17/12/22 16:56:44 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
Serialization stack:
- object not serializable (class:
scala.collection.immutable.MapLike$$anon$1, value: Map())
- field (class: org.apache.predictionio.data.storage.DataMap, name:
fields, type: interface scala.collection.immutable.Map)
- object (class org.apache.predictionio.data.storage.DataMap,
DataMap(Map()))
- field (class: org.apache.predictionio.data.storage.Event, name:
properties, type: class org.apache.predictionio.data.storage.DataMap)
- object (class org.apache.predictionio.data.storage.Event,
Event(id=Some(oE62rWfkR2R4rjazOnUUagAAAWBwlgmMtO4Dcl8MjO0),event=$set,eType=user,eId=49006307,tType=None,tId=None,p=DataMap(Map()),t=2017-12-19T21:02:55.228Z,tags=List(),pKey=None,ct=2017-12-23T00:18:56.858Z))
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at
org.apache.spark.serializer.SerializationStream.writeKey(Serializer.scala:145)
at
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:184)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
My setup is:
pio platform = 0.11.0-incubating
ur recommender = 0.6.0
scala = 2.10.6
elasticsearch = 1.7.5
My guess is that SelfCleaningDataSource produces not serializable map at 306
{code:java}
e1.properties.fields
.filterKeys(f => !e2.properties.fields.contains(f))
{code}
was:
In my hbase I have 2 events :
{code:json}
{"event" : "$set", "entityType" : "user", "entityId" : "100", "properties" :
{"test" : "0.8"}, "eventTime": "2017-12-19T21:02:49.228Z"}
{"event" : "$unset", "entityType" : "user", "entityId" : "100", "properties" :
{"test" : null}, "eventTime": "2017-12-19T21:02:55.228Z"}
{code}
which I'm trying to compress, actual result is:
{code:java}
17/12/22 16:56:44 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
Serialization stack:
- object not serializable (class:
scala.collection.immutable.MapLike$$anon$1, value: Map())
- field (class: org.apache.predictionio.data.storage.DataMap, name:
fields, type: interface scala.collection.immutable.Map)
- object (class org.apache.predictionio.data.storage.DataMap,
DataMap(Map()))
- field (class: org.apache.predictionio.data.storage.Event, name:
properties, type: class org.apache.predictionio.data.storage.DataMap)
- object (class org.apache.predictionio.data.storage.Event,
Event(id=Some(oE62rWfkR2R4rjazOnUUagAAAWBwlgmMtO4Dcl8MjO0),event=$set,eType=user,eId=49006307,tType=None,tId=None,p=DataMap(Map()),t=2017-12-19T21:02:55.228Z,tags=List(),pKey=None,ct=2017-12-23T00:18:56.858Z))
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at
org.apache.spark.serializer.SerializationStream.writeKey(Serializer.scala:145)
at
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:184)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
My setup is:
pio platform = 0.11.0-incubating
ur recommender = 0.6.0
scala = 2.10.6
elasticsearch = 1.7.5
My guess is that SelfCleaningDataSource produces not serializable map at 306
{code:java}
e1.properties.fields
.filterKeys(f => !e2.properties.fields.contains(f))
{code}
> $set - $unset events compression doesn't work
> ---------------------------------------------
>
> Key: PIO-143
> URL: https://issues.apache.org/jira/browse/PIO-143
> Project: PredictionIO
> Issue Type: Bug
> Reporter: Yarish George
>
> In my hbase I have 2 events :
> {code:javascript}
> {"event" : "$set", "entityType" : "user", "entityId" : "100", "properties" :
> {"test" : "0.8"}, "eventTime": "2017-12-19T21:02:49.228Z"}
> {"event" : "$unset", "entityType" : "user", "entityId" : "100", "properties"
> : {"test" : null}, "eventTime": "2017-12-19T21:02:55.228Z"}
> {code}
> which I'm trying to compress, actual result is:
> {code:java}
> 17/12/22 16:56:44 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
> Serialization stack:
> - object not serializable (class:
> scala.collection.immutable.MapLike$$anon$1, value: Map())
> - field (class: org.apache.predictionio.data.storage.DataMap, name:
> fields, type: interface scala.collection.immutable.Map)
> - object (class org.apache.predictionio.data.storage.DataMap,
> DataMap(Map()))
> - field (class: org.apache.predictionio.data.storage.Event, name:
> properties, type: class org.apache.predictionio.data.storage.DataMap)
> - object (class org.apache.predictionio.data.storage.Event,
> Event(id=Some(oE62rWfkR2R4rjazOnUUagAAAWBwlgmMtO4Dcl8MjO0),event=$set,eType=user,eId=49006307,tType=None,tId=None,p=DataMap(Map()),t=2017-12-19T21:02:55.228Z,tags=List(),pKey=None,ct=2017-12-23T00:18:56.858Z))
> at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
> at
> org.apache.spark.serializer.SerializationStream.writeKey(Serializer.scala:145)
> at
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:184)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> My setup is:
> pio platform = 0.11.0-incubating
> ur recommender = 0.6.0
> scala = 2.10.6
> elasticsearch = 1.7.5
> My guess is that SelfCleaningDataSource produces not serializable map at 306
> {code:java}
> e1.properties.fields
> .filterKeys(f => !e2.properties.fields.contains(f))
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)