Re: "Got wrong record after seeking to offset" issue

2018-01-17 Thread Justin Miller
By compacted do you mean compression? If so then we did recently turn on lz4 compression. If there’s another meaning if there’s a command I can run to check compaction I’m happy to give that a shot too. I’ll try consuming from the failed offset if/when the problem manifests itself again. Thanks!

good materiala to learn apache spark

2018-01-17 Thread Manuel Sopena Ballesteros
Dear Spark community, I would like to learn more about apache spark. I have a Horton works HDP platform and have ran a few spark jobs in a cluster but now I need to know more in depth how spark works. My main interest is sys admin and operational point of Spark and it's ecosystem. Is there

StreamingLogisticRegressionWithSGD : Multiclass Classification : Options

2018-01-17 Thread Sundeep Kumar Mehta
Hi, I was looking for Logistic Regression with Multi Class classifier on Streaming data do we have any alternative options or library/github prj. As StreamingLogisticRegressionWithSGD only supports binary classification Regards Sundeep

Spark Stream is corrupted

2018-01-17 Thread KhajaAsmath Mohammed
Hi, I have created a streaming object from checkpoint but it always through up error as stream corrupted when I restart spark streaming job. any solution for this? private def createStreamingContext( sparkCheckpointDir: String, sparkSession: SparkSession, batchDuration: Int, config:

Re: "Got wrong record after seeking to offset" issue

2018-01-17 Thread Cody Koeninger
That means the consumer on the executor tried to seek to the specified offset, but the message that was returned did not have a matching offset. If the executor can't get the messages the driver told it to get, something's generally wrong. What happens when you try to consume the particular

Re: update LD_LIBRARY_PATH when running apache job in a YARN cluster

2018-01-17 Thread Keith Chapman
Hi Manuel, You could use the following to add a path to the library search path, --conf spark.driver.extraLibraryPath=PathToLibFolder --conf spark.executor.extraLibraryPath=PathToLibFolder Thanks, Keith. Regards, Keith. http://keith-chapman.com On Wed, Jan 17, 2018 at 5:39 PM, Manuel Sopena

update LD_LIBRARY_PATH when running apache job in a YARN cluster

2018-01-17 Thread Manuel Sopena Ballesteros
Dear Spark community, I have a spark running in a yarn cluster and I am getting some error when trying to run my python application. /home/mansop/virtenv/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory Is

Re: Testing Spark-Cassandra

2018-01-17 Thread Guillermo Ortiz
Thanks, I'll check it ;) 2018-01-17 17:19 GMT+01:00 Alonso Isidoro Roman : > Yes, you can use docker to build your own cassandra ring. Depending your > SO, instructions may change, so, please, follow this >

Re: Testing Spark-Cassandra

2018-01-17 Thread Alonso Isidoro Roman
Yes, you can use docker to build your own cassandra ring. Depending your SO, instructions may change, so, please, follow this link to install it, and then follow this project,

Testing Spark-Cassandra

2018-01-17 Thread Guillermo Ortiz
Hello, I'm using spark 2.0 and Cassandra. Is there any util to make unit test easily or which one would be the best way to do it? library? Cassandra with docker?