GitHub user nickwallen opened a pull request:
https://github.com/apache/metron/pull/1012
METRON-1551 Profiler Should Not Use Java Serialization
When running the Profiler in a topology where serialization occurs, the
following error happens. This can occur when the number of workers is greater
than 1.
The topology should not be using Java serialization for serializing tuple
values as this will negatively impact performance.
```
2018-05-09 10:48:35.136 o.a.s.d.executor [ERROR]
java.lang.RuntimeException: java.lang.RuntimeException:
java.io.NotSerializableException:
org.apache.metron.common.configuration.profiler.ProfileResult
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:485)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.disruptor$consume_loop_STAR_$fn__7183.invoke(disruptor.clj:83)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484)
[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
Caused by: java.lang.RuntimeException: java.io.NotSerializableException:
org.apache.metron.common.configuration.profiler.ProfileResult
at
org.apache.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:41)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
~[kryo-3.0.3.jar:?]
at
org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:44)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:44)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.daemon.worker$mk_transfer_fn$transfer_fn__7805.invoke(worker.clj:193)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__7430.invoke(executor.clj:309)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.disruptor$clojure_handler$reify__7166.onEvent(disruptor.clj:40)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
... 6 more
Caused by: java.io.NotSerializableException:
org.apache.metron.common.configuration.profiler.ProfileResult
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
~[?:1.8.0_162]
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
~[?:1.8.0_162]
at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
~[?:1.8.0_162]
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
~[?:1.8.0_162]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
~[?:1.8.0_162]
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
~[?:1.8.0_162]
at
org.apache.storm.serialization.SerializableSerializer.write(SerializableSerializer.java:38)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
~[kryo-3.0.3.jar:?]
at
org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:44)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:44)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.daemon.worker$mk_transfer_fn$transfer_fn__7805.invoke(worker.clj:193)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__7430.invoke(executor.clj:309)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.disruptor$clojure_handler$reify__7166.onEvent(disruptor.clj:40)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472)
~[storm-core-1.1.0.2.6.4.0-91.jar:1.1.0.2.6.4.0-91]
... 6 more
```
### Changes
* Defined a configuration property `topology.kryo.register` for Storm that
specifies the classes which should use Kryo serialization. Storm requires that
you specifically tell it which classes should undergo Kryo serialization.
* If an advanced user creates a profile that results in a user-defined type
(a class not already in `topology.kryo.register`) this property allows them to
register the class for Kryo serialization.
* Updated the existing integration tests to force them to serialize tuples
and to fail if Java serialization is used.
* Created another integration test that creates a profile that uses the
STATS library. This ensures that the STATS library can be Kryo serialized.
* Updated the README to reflect this change.
### Manual Testing
1. Launch the development environment.
1. Alter the Profiler properties.
Set the Profiler duration to 1 minute.
```
profiler.period.duration=1
profiler.period.duration.units=MINUTES
```
Force the topology to fail if Kryo serialization is not used.
```
topology.fall.back.on.java.serialization=false
```
Force the topology to serialize values, even with a single worker.
```
topology.testing.always.try.serialize=true
```
1. Restart the Profiler.
```
source /etc/default/metron
storm kill profiler
$METRON_HOME/bin/start_profiler_topology.sh
```
1. Create a Profile that uses the STATS library.
```
$METRON_HOME/bin/stellar -z $ZOOKEEPER
```
```
[Stellar]>>> conf := SHELL_EDIT()
{
"profiles": [
{
"profile": "profile-with-stats",
"foreach": "'global'",
"init": { "stats": "STATS_INIT()" },
"update": { "stats": "STATS_ADD(stats, 1)" },
"result": "stats"
}
],
"timestampField": "timestamp"
}
[Stellar]>>> CONFIG_PUT("PROFILER",conf)
```
1. Wait until a flush event occurs, then attempt to retrieve the profile
values. If values can be retrieved, the change has been successful.
```
[Stellar]>>> stats := PROFILE_GET("profile-with-stats", "global",
PROFILE_FIXED(30, "DAYS"))
[org.apache.metron.statistics.OnlineStatisticsProvider@c8bab446,
org.apache.metron.statistics.OnlineStatisticsProvider@2990520a,
org.apache.metron.statistics.OnlineStatisticsProvider@f6affafd,
org.apache.metron.statistics.OnlineStatisticsProvider@de0d511,
org.apache.metron.statistics.OnlineStatisticsProvider@9247daa4,
org.apache.metron.statistics.OnlineStatisticsProvider@ddef62a6,
org.apache.metron.statistics.OnlineStatisticsProvider@2e6874a6,
org.apache.metron.statistics.OnlineStatisticsProvider@8568c433,
org.apache.metron.statistics.OnlineStatisticsProvider@dc65d27e,
org.apache.metron.statistics.OnlineStatisticsProvider@28b2e728]
```
```
[Stellar]>>> STATS_MEAN(STATS_MERGE(stats))
1.0
```
## Pull Request Checklist
- [ ] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [ ] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
- [ ] Have you written or updated unit tests and or integration tests to
verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/metron METRON-1551
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1012.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1012
----
commit 387cd2de46b6a2392d1fe7a537fc8107b3bf4a59
Author: Nick Allen <nick@...>
Date: 2018-05-11T19:22:00Z
METRON-1551 Profiler Should Not Use Java Serialization
----
---