[
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126936#comment-15126936
]
Andrew Davidson commented on SPARK-13009:
-----------------------------------------
Agreed. they ask me to file a spark RFE. I post your comment back to them and
see what they say.
If the StatusJSONImpl class was not marked final it would be easy for spark to
create a the wrapper.
Andhy
> spark-streaming-twitter_2.10 does not make it possible to access the raw
> twitter json
> -------------------------------------------------------------------------------------
>
> Key: SPARK-13009
> URL: https://issues.apache.org/jira/browse/SPARK-13009
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.6.0
> Reporter: Andrew Davidson
> Priority: Blocker
> Labels: twitter
>
> The Streaming-twitter package makes it easy for Java programmers to work with
> twitter. The implementation returns the raw twitter data in JSON formate as a
> twitter4J StatusJSONImpl object
> JavaDStream<Status> tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status
> object will be the same as the original json. I have down stream systems that
> can only process raw tweets not twitter4J Status objects.
> Here is my bug/RFE request made to Twitter4J <[email protected]>.
> They asked I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to
> store the raw JSON. There are other systems that process that data that are
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be
> possible to add a member function to StatusJSONImpl getRawJson(). By default
> the returned value would be null unless jsonStoreEnabled=True is set in the
> config.
> Alternative implementations:
>
> It should be possible to modify the spark-streaming-twitter_2.10 to provide
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10 does not expose all of the twitter streaming
> API so many developers are writing their implementations of
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance
> difficult. Its not easy to know when the spark implementation for twitter has
> changed.
> Code listing for
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
> ) extends Receiver[Status](storageLevel) with Logging {
> @volatile private var twitterStream: TwitterStream = _
> @volatile private var stopped = false
> def onStart() {
> try {
> val newTwitterStream = new
> TwitterStreamFactory().getInstance(twitterAuth)
> newTwitterStream.addListener(new StatusListener {
> def onStatus(status: Status): Unit = {
> store(status)
> }
> Ref:
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From: <[email protected]> on behalf of Igor Brigadir
> <[email protected]>
> Reply-To: <[email protected]>
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J <[email protected]>
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44
> +0000 2015", ...
> It looks like the json you have was serialized from a java Status object,
> which makes json objects different to what you get from the API,
> TwitterObjectFactory expects json from Twitter (I haven't had any problems
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct,
> twitter API json - it should look like the example here:
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the
> Twitter API, and save that for later. (You must have jsonStoreEnabled=True in
> your config, and call getRawJSON in the same thread as .showStatus() or
> lookup() or whatever you're using to load tweets.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]