zhaP524 created SPARK-21948:
-------------------------------
Summary: How to use spark streaming for deal two table from one
topic of topic??
Key: SPARK-21948
URL: https://issues.apache.org/jira/browse/SPARK-21948
Project: Spark
Issue Type: Bug
Components: Spark Core, Spark Submit, Structured Streaming
Affects Versions: 2.1.1
Environment: kafka:0.10.0
Spark:2.1.1
Reporter: zhaP524
Now, I have such A requirement, I want from A topic of kafka、 A group of
receiving data, receive to DirectStream contains two tables of data (A<Master>,
B<Slave>), my ultimate goal is the two tables of data according to some fields
join operation, will produce the results into the Database;
My previous operations are as follows:
1. Use batch processing directly at the DirectStream layer, filter out the data
of A and B, produce their respective DF, and then use Spark SQL to perform the
join operation, and the results are then entered into the library;But this kind
of situation will exist problems, A and B table 1: N relationship, when
selected A batch, may lead to A full load, and part B table loaded only, lead
to the calculation results is only part of the next batch came in already
consumed A data, table B subsequent data has no associated data, leading to
loss of data;I don't know if you can store the A table data using the queue and
so on, until the total load of B table data is loaded and processed to generate
the complete result. In this case, the essence is similar to Spark Streaming
Window Operation
2、
2, the use of Spark Streaming Window Operation for processing, I can isolate A
and B table when DirectStream flow stream, but the join Operation shall be
carried out in the Window, the Window Operation did not support the transform
Operation such as map, lead to process cannot go
through(java.io.NotSerializableException:
org.apache.kafka.clients.consumer.ConsumerRecord
Serialization stack:
- object not serializable (class:
org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic =
doc, partition = 0,...),
I don't know what should I do??
So, I was wondering if there was a problem with my scene?Or do I have a
technical problem?Is there a solution to my business scenario without
introducing other components?Trouble to give directions.TKS
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]