Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19274
ping @lonelytrooper for @koeninger's comment. Otherwise, let me propose to
close this for now.
---
-
To unsubscribe,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19274
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19274
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19274
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19274
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19274
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19274
Search Jira and the mailing list, this idea has been brought up multiple
times. I don't think breaking fundamental assumptions of Kafka (one consumer
thread per group per partition) is a good
Github user lonelytrooper commented on the issue:
https://github.com/apache/spark/pull/19274
Thank you so much for inviting more discussions!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user lonelytrooper commented on the issue:
https://github.com/apache/spark/pull/19274
I guessed that.. This is true, this feature can not ensure the ordering of
data in one Kafka partition, but quite a few applications(like dealing with
logs) do not need strict order
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19274
This is because it is the only way to guarantee the ordering of data in
Kafka partition mapping to Spark partition. Maybe some other users took as as
an assumption to write the code.
Github user lonelytrooper commented on the issue:
https://github.com/apache/spark/pull/19274
Hi Jerry, thank you so much for discussingï¼ Actually, we tried
'repartition' before introducing this feature and for two reasons we give it
up. First, it leads to shuffle which may
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19274
Yes, I understand your scenario, but my concern is that your proposal is
quite scenario specific, it may well serve your scenario, but somehow it breaks
the design purpose of KafkaRDD. From my
Github user lonelytrooper commented on the issue:
https://github.com/apache/spark/pull/19274
lonelytrooper... : Pwill more executors be used in RDD#mapPartitions
way ? I'll try that later to see if it works. I think if Spark provides a
convenient way for this , it would help
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19274
Hi @loneknightpy , think a bit on your PR, I think this can also be done in
the user side. User could create several threads in one task
(RDD#mapPartitions) to consume the records concurrently,
Github user lonelytrooper commented on the issue:
https://github.com/apache/spark/pull/19274
Yes. One Kafka partition will map to many Spark partitions, thus more
executors can be used.
---
-
To unsubscribe,
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19274
Will this break the assumption that one Kafka partition will map to one
Spark partition?
---
-
To unsubscribe, e-mail:
16 matches
Mail list logo