spark git commit: [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists

2017-02-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 68f2142cf -> 9f8e39215 [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists ## What changes were proposed in this pull request? currently if multiple streaming

spark git commit: [SPARK-19617][SS] Fix the race condition when starting and stopping a query quickly

2017-02-17 Thread zsxwing
lem after we fix the race condition. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16947 from zsxwing/SPARK-19617. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/15b144d2 T

spark git commit: [SPARK-19517][SS] KafkaSource fails to initialize partition offsets

2017-02-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6e3abed8f -> b083ec511 [SPARK-19517][SS] KafkaSource fails to initialize partition offsets ## What changes were proposed in this pull request? This patch fixes a bug in `KafkaSource` with the (de)serialization of the length of the

spark git commit: [SPARK-19517][SS] KafkaSource fails to initialize partition offsets

2017-02-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4cc06f4eb -> 1a3f5f8c5 [SPARK-19517][SS] KafkaSource fails to initialize partition offsets ## What changes were proposed in this pull request? This patch fixes a bug in `KafkaSource` with the (de)serialization of the length of the JSON

spark git commit: [SPARK-19603][SS] Fix StreamingQuery explain command

2017-02-15 Thread zsxwing
ks.com> Closes #16934 from zsxwing/SPARK-19603. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc02ef95 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc02ef95 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-19603][SS] Fix StreamingQuery explain command

2017-02-15 Thread zsxwing
;shixi...@databricks.com> Closes #16934 from zsxwing/SPARK-19603. (cherry picked from commit fc02ef95cdfc226603b52dc579b7133631f7143d) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/

spark git commit: [SPARK-19599][SS] Clean up HDFSMetadataLog

2017-02-15 Thread zsxwing
; Unit` and just call `serialize` directly. - Remove catching FileNotFoundException. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16932 from zsxwing/metadata-cleanup. (cherry picked from commit 21b4ba2d6f21a9759af879471715c123073bd67a

spark git commit: [SPARK-19599][SS] Clean up HDFSMetadataLog

2017-02-15 Thread zsxwing
; Unit` and just call `serialize` directly. - Remove catching FileNotFoundException. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16932 from zsxwing/metadata-cleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: ht

spark git commit: [HOTFIX][SPARK-19542][SS]Fix the missing import in DataStreamReaderWriterSuite

2017-02-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 328b22984 -> 2968d8c06 [HOTFIX][SPARK-19542][SS]Fix the missing import in DataStreamReaderWriterSuite Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2968d8c0

spark git commit: [SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorClassLoader to load Netty generated classes

2017-02-13 Thread zsxwing
s Author: Shixiong Zhu <shixi...@databricks.com> Closes #16859 from zsxwing/SPARK-17714. (cherry picked from commit 905fdf0c243e1776c54c01a25b17878361400225) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: htt

spark git commit: [SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorClassLoader to load Netty generated classes

2017-02-13 Thread zsxwing
s Author: Shixiong Zhu <shixi...@databricks.com> Closes #16859 from zsxwing/SPARK-17714. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/905fdf0c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/905fdf0c Diff: http:

spark git commit: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 06e77e009 -> fe4fcc570 [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers should not be in the same group ## What changes were proposed in this pull request? In `KafkaOffsetReader`, when error occurs, we abort the

spark git commit: [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers should not be in the same group

2017-02-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bc0a0e639 -> 2bdbc8705 [SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers should not be in the same group ## What changes were proposed in this pull request? In `KafkaOffsetReader`, when error occurs, we abort the

spark git commit: [SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operations for branch-2.1

2017-02-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 71b6eacf7 -> 502c927b8 [SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operations for branch-2.1 This is a follow up PR for merging #16758 to spark 2.1 branch ## What changes were proposed in this pull request?

[1/2] spark git commit: [SPARK-18682][SS] Batch Source for Kafka

2017-02-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 dd1abef13 -> e642a07d5 http://git-wip-us.apache.org/repos/asf/spark/blob/e642a07d/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[2/2] spark git commit: [SPARK-18682][SS] Batch Source for Kafka

2017-02-07 Thread zsxwing
[SPARK-18682][SS] Batch Source for Kafka Today, you can start a stream that reads from kafka. However, given kafka's configurable retention period, it seems like sometimes you might just want to read all of the data that is available now. As such we should add a version that works with

[1/2] spark git commit: [SPARK-18682][SS] Batch Source for Kafka

2017-02-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 73ee73945 -> 8df03 http://git-wip-us.apache.org/repos/asf/spark/blob/8df0/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[2/2] spark git commit: [SPARK-18682][SS] Batch Source for Kafka

2017-02-07 Thread zsxwing
[SPARK-18682][SS] Batch Source for Kafka ## What changes were proposed in this pull request? Today, you can start a stream that reads from kafka. However, given kafka's configurable retention period, it seems like sometimes you might just want to read all of the data that is available now. As

spark git commit: [SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme

2017-02-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 f55bd4c73 -> 62fab5bee [SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme ## What changes were proposed in this pull request? ``` Caused by: java.lang.IllegalArgumentException: Wrong FS:

spark git commit: [SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme

2017-02-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master fab0d62a7 -> 7a0a630e0 [SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme ## What changes were proposed in this pull request? ``` Caused by: java.lang.IllegalArgumentException: Wrong FS:

spark git commit: [SPARK-19437] Rectify spark executor id in HeartbeatReceiverSuite.

2017-02-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1d5d2a9d0 -> c86a57f4d [SPARK-19437] Rectify spark executor id in HeartbeatReceiverSuite. ## What changes were proposed in this pull request? The current code in `HeartbeatReceiverSuite`, executorId is set as below: ``` private val

spark git commit: [SPARK-19432][CORE] Fix an unexpected failure when connecting timeout

2017-02-01 Thread zsxwing
e$1.apply(Future.scala:136) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ``` It's better to provide a meaningful message. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16773 from zsxwing/connect-timeout. Proj

spark git commit: [SPARK-19432][CORE] Fix an unexpected failure when connecting timeout

2017-02-01 Thread zsxwing
fun$onFailure$1.apply(Future.scala:136) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ``` It's better to provide a meaningful message. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16773 from zsxwing/connect-timeout. (cher

spark git commit: [SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED

2017-02-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 5ed397baa -> df4a27cc5 [SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED ## What changes were proposed in this pull request? Copying of the killed status was missing while getting the newTaskInfo object by dropping

spark git commit: [SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED

2017-02-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 61cdc8c7c -> f94646415 [SPARK-19377][WEBUI][CORE] Killed tasks should have the status as KILLED ## What changes were proposed in this pull request? Copying of the killed status was missing while getting the newTaskInfo object by

spark git commit: [SPARK-19365][CORE] Optimize RequestMessage serialization

2017-01-27 Thread zsxwing
2679 5 2760 6 2710 7 2747 8 2793 9 2679 10 2651 ``` I also captured the TCP packets for this test. Before this patch, the total size of TCP packets is ~1.5GB. After it, it reduces to ~1.2GB. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16

spark git commit: [SPARK-19330][DSTREAMS] Also show tooltip for successful batches

2017-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 b94fb284b -> c13378796 [SPARK-19330][DSTREAMS] Also show tooltip for successful batches ## What changes were proposed in this pull request? ### Before

spark git commit: [SPARK-19330][DSTREAMS] Also show tooltip for successful batches

2017-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 15ef3740d -> 40a4cfc7c [SPARK-19330][DSTREAMS] Also show tooltip for successful batches ## What changes were proposed in this pull request? ### Before

[2/2] spark git commit: [SPARK-19139][CORE] New auth mechanism for transport library.

2017-01-24 Thread zsxwing
[SPARK-19139][CORE] New auth mechanism for transport library. This change introduces a new auth mechanism to the transport library, to be used when users enable strong encryption. This auth mechanism has better security than the currently used DIGEST-MD5. The new protocol uses symmetric key

[1/2] spark git commit: [SPARK-19139][CORE] New auth mechanism for transport library.

2017-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d9783380f -> 8f3f73abc http://git-wip-us.apache.org/repos/asf/spark/blob/8f3f73ab/common/network-common/src/test/java/org/apache/spark/network/crypto/AuthIntegrationSuite.java

spark git commit: [SPARK-19268][SS] Disallow adaptive query execution for streaming queries

2017-01-23 Thread zsxwing
hes, it may break streaming queries. Hence, we should disallow this feature in Structured Streaming. ## How was this patch tested? `test("SPARK-19268: Adaptive query execution should be disallowed")`. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16683 from zsxwing/SPA

spark git commit: [SPARK-19268][SS] Disallow adaptive query execution for streaming queries

2017-01-23 Thread zsxwing
hes, it may break streaming queries. Hence, we should disallow this feature in Structured Streaming. ## How was this patch tested? `test("SPARK-19268: Adaptive query execution should be disallowed")`. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16683 from zsxwing/SP

spark git commit: [SPARK-19182][DSTREAM] Optimize the lock in StreamingJobProgressListener to not block UI when generating Streaming jobs

2017-01-18 Thread zsxwing
ter to fetch metadata). It's better to optimize the locks in DStreamGraph and StreamingJobProgressListener to make the UI not block by job generation. ## How was this patch tested? existing ut cc zsxwing Author: uncleGen <husty...@gmail.com> Closes #16601 from uncleGen/SPARK-19182. Proj

spark git commit: [SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon error

2017-01-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c050c1227 -> 569e50680 [SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon error ## What changes were proposed in this pull request? We should call `StateStore.abort()` when there should be any error before the store is

spark git commit: [SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon error

2017-01-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 047506bae -> 4cff0b504 [SPARK-19168][STRUCTURED STREAMING] StateStore should be aborted upon error ## What changes were proposed in this pull request? We should call `StateStore.abort()` when there should be any error before the

spark git commit: [SPARK-19113][SS][TESTS] Ignore StreamingQueryException thrown from awaitInitialization to avoid breaking tests

2017-01-18 Thread zsxwing
tch `StreamingQueryException` as well. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16567 from zsxwing/SPARK-19113-2. (cherry picked from commit c050c12274fba2ac4c4938c4724049a47fa59280) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-19113][SS][TESTS] Ignore StreamingQueryException thrown from awaitInitialization to avoid breaking tests

2017-01-18 Thread zsxwing
tch `StreamingQueryException` as well. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16567 from zsxwing/SPARK-19113-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c050c122 Tree: http:

spark git commit: [SPARK-18905][STREAMING] Fix the issue of removing a failed jobset from JobScheduler.jobSets

2017-01-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c84f7d3e1 -> f8db8945f [SPARK-18905][STREAMING] Fix the issue of removing a failed jobset from JobScheduler.jobSets ## What changes were proposed in this pull request? the current implementation of Spark streaming considers a batch is

spark git commit: [SPARK-18905][STREAMING] Fix the issue of removing a failed jobset from JobScheduler.jobSets

2017-01-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 975890507 -> f4317be66 [SPARK-18905][STREAMING] Fix the issue of removing a failed jobset from JobScheduler.jobSets ## What changes were proposed in this pull request? the current implementation of Spark streaming considers a batch

spark git commit: [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries

2017-01-10 Thread zsxwing
ame as the append mode if a query has no aggregations. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16520 from zsxwing/update-without-agg. (cherry picked from commit bc6c56e940fe93591a1e5ba45751f1b243b57e28) Signed-off-by: Shixiong Z

spark git commit: [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries

2017-01-10 Thread zsxwing
ame as the append mode if a query has no aggregations. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16520 from zsxwing/update-without-agg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStarted to ensure catching fatal errors during query initialization

2017-01-10 Thread zsxwing
ets `UncaughtExceptionHandler` after starting the query now. It may not be able to catch fatal errors during query initialization. This PR uses `onQueryStarted` callback to fix it. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16492 from zsxwing/SP

spark git commit: [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly

2017-01-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 65c866ef9 -> 69d1c4c5c [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly ## What changes were proposed in this pull request? `DataStreamReaderWriterSuite` makes test files in source folder like the

spark git commit: [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly

2017-01-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3ef183a94 -> d5b1dc934 [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly ## What changes were proposed in this pull request? `DataStreamReaderWriterSuite` makes test files in source folder like the followings.

spark git commit: [SPARK-19041][SS] Fix code snippet compilation issues in Structured Streaming Programming Guide

2017-01-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 517f39833 -> d489e1dc7 [SPARK-19041][SS] Fix code snippet compilation issues in Structured Streaming Programming Guide ## What changes were proposed in this pull request? Currently some code snippets in the programming guide just do

spark git commit: [SPARK-19050][SS][TESTS] Fix EventTimeWatermarkSuite 'delay in months and years handled correctly'

2017-01-01 Thread zsxwing
num)`, so `monthDiff` has two possible values. ## How was this patch tested? Jenkins. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16449 from zsxwing/watermark-test-hotfix. (cherry picked from commit 2394047370d2d93bd8bc57b996fee47465c470af) Signed-off-by: Shixiong Z

spark git commit: [SPARK-19050][SS][TESTS] Fix EventTimeWatermarkSuite 'delay in months and years handled correctly'

2017-01-01 Thread zsxwing
so `monthDiff` has two possible values. ## How was this patch tested? Jenkins. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16449 from zsxwing/watermark-test-hotfix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status

2016-12-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 7197a7bc7 -> 80d583bd0 [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code

spark git commit: [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status

2016-12-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6a475ae46 -> 092c6725b [SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet

spark git commit: [SPARK-17755][CORE] Use workerRef to send RegisterWorkerResponse to avoid the race condition

2016-12-25 Thread zsxwing
nce `LaunchExecutor` will always be after `RegisterWorkerResponse` and never be ignored. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16345 from zsxwing/SPARK-17755. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use ConcurrentHashMap to make it faster

2016-12-23 Thread zsxwing
O(1). Changing ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to `ConcurrentHashMap's` will make the removal much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16390 from zsxwing/SPARK-18991. Project: http:

spark git commit: [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use ConcurrentHashMap to make it faster

2016-12-23 Thread zsxwing
s O(1). Changing ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to `ConcurrentHashMap's` will make the removal much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16390 from zsxwing/SPARK-18991. (cherry picked fr

spark git commit: [SPARK-18972][CORE] Fix the netty thread names for RPC

2016-12-22 Thread zsxwing
ake ShuffleBlockFetcherIterator throw NoSuchElementException if it has no more elements. Otherwise, if the caller calls `next` without `hasNext`, it will just hang. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16380 from zsxwing/SPARK-18972. (cherry picked fr

spark git commit: [SPARK-18972][CORE] Fix the netty thread names for RPC

2016-12-22 Thread zsxwing
ake ShuffleBlockFetcherIterator throw NoSuchElementException if it has no more elements. Otherwise, if the caller calls `next` without `hasNext`, it will just hang. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16380 from zsxwing/SPARK-18972. (cherry picked fr

spark git commit: [SPARK-18972][CORE] Fix the netty thread names for RPC

2016-12-22 Thread zsxwing
ake ShuffleBlockFetcherIterator throw NoSuchElementException if it has no more elements. Otherwise, if the caller calls `next` without `hasNext`, it will just hang. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16380 from zsxwing/SPARK-18972. Project: http:

spark git commit: [SPARK-18985][SS] Add missing @InterfaceStability.Evolving for Structured Streaming APIs

2016-12-22 Thread zsxwing
PIs ## How was this patch tested? Compiling the codes. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16385 from zsxwing/SPARK-18985. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2246ce88 Tree: http:

spark git commit: [SPARK-18985][SS] Add missing @InterfaceStability.Evolving for Structured Streaming APIs

2016-12-22 Thread zsxwing
PIs ## How was this patch tested? Compiling the codes. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16385 from zsxwing/SPARK-18985. (cherry picked from commit 2246ce88ae6bf842cf325ee3efcb7bea53f8ca37) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality

2016-12-22 Thread zsxwing
Closes #16321 from zsxwing/SPARK-18031. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/542be408 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/542be408 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/542be408

spark git commit: [SPARK-18908][SS] Creating StreamingQueryException should check if logicalPlan is created

2016-12-21 Thread zsxwing
ted `test("max files per trigger - incorrect values")`. I found this issue when I switched from `testStream` to the real codes to verify the failure in this test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #16322 from zsxwing/SPARK-18907. (cherry

spark git commit: [SPARK-18954][TESTS] Fix flaky test: o.a.s.streaming.BasicOperationsSuite rdd cleanup - map and window

2016-12-21 Thread zsxwing
may not be able to finish before stopping StreamingContext. This PR basically just puts the assertions into `eventually` and runs it before stopping StreamingContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16362 from zsxwing/SPARK-18954.

spark git commit: [SPARK-18954][TESTS] Fix flaky test: o.a.s.streaming.BasicOperationsSuite rdd cleanup - map and window

2016-12-21 Thread zsxwing
may not be able to finish before stopping StreamingContext. This PR basically just puts the assertions into `eventually` and runs it before stopping StreamingContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16362 from zsxwing/SPARK-18954.

spark git commit: [SPARK-18954][TESTS] Fix flaky test: o.a.s.streaming.BasicOperationsSuite rdd cleanup - map and window

2016-12-21 Thread zsxwing
may not be able to finish before stopping StreamingContext. This PR basically just puts the assertions into `eventually` and runs it before stopping StreamingContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16362 from zsxwing/SPARK-18954. Proj

spark git commit: [SPARK-18894][SS] Fix event time watermark delay threshold specified in months or years

2016-12-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 bc54a14b4 -> 3c8861d92 [SPARK-18894][SS] Fix event time watermark delay threshold specified in months or years ## What changes were proposed in this pull request? Two changes - Fix how delays specified in months and years are

spark git commit: [SPARK-18894][SS] Fix event time watermark delay threshold specified in months or years

2016-12-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1a6438897 -> 607a1e63d [SPARK-18894][SS] Fix event time watermark delay threshold specified in months or years ## What changes were proposed in this pull request? Two changes - Fix how delays specified in months and years are translated

spark git commit: [SPARK-18927][SS] MemorySink for StructuredStreaming can't recover from checkpoint if location is provided in SessionConf

2016-12-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 cd297c390 -> 3857d5ba8 [SPARK-18927][SS] MemorySink for StructuredStreaming can't recover from checkpoint if location is provided in SessionConf ## What changes were proposed in this pull request? Checkpoint Location can be defined

spark git commit: [SPARK-18927][SS] MemorySink for StructuredStreaming can't recover from checkpoint if location is provided in SessionConf

2016-12-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 95c95b71e -> caed89321 [SPARK-18927][SS] MemorySink for StructuredStreaming can't recover from checkpoint if location is provided in SessionConf ## What changes were proposed in this pull request? Checkpoint Location can be defined for a

spark git commit: [SPARK-18904][SS][TESTS] Merge two FileStreamSourceSuite files

2016-12-16 Thread zsxwing
ins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16315 from zsxwing/FileStreamSourceSuite. (cherry picked from commit 4faa8a3ec0bae4b210bc5d79918e008ab218f55a) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-18904][SS][TESTS] Merge two FileStreamSourceSuite files

2016-12-16 Thread zsxwing
hor: Shixiong Zhu <shixi...@databricks.com> Closes #16315 from zsxwing/FileStreamSourceSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4faa8a3e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4faa8a3e D

spark git commit: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single listener, check trigger...

2016-12-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 32ff96452 -> 9c7f83b02 [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single listener, check trigger... ## What changes were proposed in this pull request? Use `recentProgress` instead of `lastProgress` and filter out

spark git commit: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single listener, check trigger...

2016-12-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 a7364a82e -> 08e427287 [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single listener, check trigger... ## What changes were proposed in this pull request? Use `recentProgress` instead of `lastProgress` and filter out

spark git commit: [SPARK-18852][SS] StreamingQuery.lastProgress should be null when recentProgress is empty

2016-12-14 Thread zsxwing
ows NoSuchElementException and it's hard to be used in Python since Python user will just see Py4jError. This PR just makes it return null instead. ## How was this patch tested? `test("lastProgress should be null when recentProgress is empty")` Author: Shixiong Zhu <shixi...@databricks.com> Closes

spark git commit: [SPARK-18852][SS] StreamingQuery.lastProgress should be null when recentProgress is empty

2016-12-14 Thread zsxwing
ion and it's hard to be used in Python since Python user will just see Py4jError. This PR just makes it return null instead. ## How was this patch tested? `test("lastProgress should be null when recentProgress is empty")` Author: Shixiong Zhu <shixi...@databricks.com> Closes

spark git commit: [SPARK-18843][CORE] Fix timeout in awaitResultInForkJoinSafely (branch 2.1, 2.0)

2016-12-13 Thread zsxwing
2.1 and 2.0. Master has been fixed by https://github.com/apache/spark/pull/16230. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16268 from zsxwing/SPARK-18843. (cherry picked from commit f672bfdf9689c0ab74226b11785ada50b72cd488) Signed-off-by: Shi

spark git commit: [SPARK-18843][CORE] Fix timeout in awaitResultInForkJoinSafely (branch 2.1, 2.0)

2016-12-13 Thread zsxwing
2.1 and 2.0. Master has been fixed by https://github.com/apache/spark/pull/16230. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16268 from zsxwing/SPARK-18843. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-18773][CORE] Make commons-crypto config translation consistent.

2016-12-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 8a51cfdca -> bc59951ba [SPARK-18773][CORE] Make commons-crypto config translation consistent. This change moves the logic that translates Spark configuration to commons-crypto configuration to the network-common module. It also extends

spark git commit: [SPARK-18790][SS] Keep a general offset history of stream batches

2016-12-11 Thread zsxwing
ied number of batches: the offsets that are present in each batch versions of the state store the files lists stored for the FileStreamSource the metadata log stored by the FileStreamSink marmbrus zsxwing ## How was this patch tested? The following tests were added. ### StreamExecution off

spark git commit: [SPARK-18790][SS] Keep a general offset history of stream batches

2016-12-11 Thread zsxwing
ied number of batches: the offsets that are present in each batch versions of the state store the files lists stored for the FileStreamSource the metadata log stored by the FileStreamSink marmbrus zsxwing ## How was this patch tested? The following tests were added. ### StreamExecution offset metad

spark git commit: [SPARK-18811] StreamSource resolution should happen in stream execution thread

2016-12-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 8bf56cc46 -> b020ce408 [SPARK-18811] StreamSource resolution should happen in stream execution thread ## What changes were proposed in this pull request? When you start a stream, if we are trying to resolve the source of the stream,

spark git commit: [SPARK-18811] StreamSource resolution should happen in stream execution thread

2016-12-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3e11d5bfe -> 63c915987 [SPARK-18811] StreamSource resolution should happen in stream execution thread ## What changes were proposed in this pull request? When you start a stream, if we are trying to resolve the source of the stream, for

spark git commit: [SPARK-4105] retry the fetch or stage if shuffle block is corrupt

2016-12-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d60ab5fd9 -> cf33a8628 [SPARK-4105] retry the fetch or stage if shuffle block is corrupt ## What changes were proposed in this pull request? There is an outstanding issue that existed for a long time: Sometimes the shuffle blocks are

spark git commit: [SPARK-18751][CORE] Fix deadlock when SparkContext.stop is called in Utils.tryOrStopSparkContext

2016-12-08 Thread zsxwing
ate the potential deadlock. I also removed my changes in #15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16178 from zsxwing/fix-stop-deadlock. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-18751][CORE] Fix deadlock when SparkContext.stop is called in Utils.tryOrStopSparkContext

2016-12-08 Thread zsxwing
ses it to eliminate the potential deadlock. I also removed my changes in #15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16178 from zsxwing/fix-stop-deadlock. (cherry picked fr

spark git commit: [SPARK-18764][CORE] Add a warning log when skipping a corrupted file

2016-12-07 Thread zsxwing
hen we want to finish the job first, then find them in the log and fix these files. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16192 from zsxwing/SPARK-18764. (cherry picked from commit dbf3e298a1a35c0243f087814ddf88034ff96d66) S

spark git commit: [SPARK-18764][CORE] Add a warning log when skipping a corrupted file

2016-12-07 Thread zsxwing
we want to finish the job first, then find them in the log and fix these files. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16192 from zsxwing/SPARK-18764. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as formatted string instead of millis

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 65f5331a7 -> 9b5bc2a6a [SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as formatted string instead of millis ## What changes were proposed in this pull request? Easier to read while debugging as a formatted string (in

spark git commit: [SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as formatted string instead of millis

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4cc8d8906 -> 539bb3cf9 [SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as formatted string instead of millis ## What changes were proposed in this pull request? Easier to read while debugging as a formatted string (in

spark git commit: [SPARK-18744][CORE] Remove workaround for Netty memory leak

2016-12-06 Thread zsxwing
tty/netty/issues/5833 Now we can remove them as it's fixed in Netty 4.0.42.Final. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16167 from zsxwing/remove-netty-workaround. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master cb1f10b46 -> 1ef6b296d [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version,

spark git commit: [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 ace4079c5 -> d20e0d6b8 [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark

spark git commit: [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException (branch 2.1)

2016-12-05 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16153 from zsxwing/SPARK-18694-2.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c6a4e3d9 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly

2016-12-01 Thread zsxwing
ads to stop StreamingContext. Otherwise, the latch added in #16091 can be passed too early. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16105 from zsxwing/SPARK-18617-2. (cherry picked from commit 086b0c8f6788b205bc630d5ccf078f77b9751af3) S

spark git commit: [SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread zsxwing
kson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1099) ... ``` This PR just ignores such errors and adds a test to make sure we can read 2.0.2 logs. ## How was this patch tested? `query-event-logs-version-2.0.2.txt` has all types of events generated

spark git commit: [SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread zsxwing
bind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1099) ... ``` This PR just ignores such errors and adds a test to make sure we can read 2.0.2 logs. ## How was this patch tested? `query-event-logs-version-2.0.2.txt` has all types of events generate

spark git commit: [SPARK-18188] add checksum for blocks of broadcast

2016-11-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 ea6957da2 -> 06a56df22 [SPARK-18188] add checksum for blocks of broadcast ## What changes were proposed in this pull request? A TorrentBroadcast is serialized and compressed first, then splitted as fixed size blocks, if any block is

spark git commit: [SPARK-18188] add checksum for blocks of broadcast

2016-11-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3c0beea47 -> 7d5cb3af7 [SPARK-18188] add checksum for blocks of broadcast ## What changes were proposed in this pull request? A TorrentBroadcast is serialized and compressed first, then splitted as fixed size blocks, if any block is

spark git commit: [SPARK-18547][CORE] Propagate I/O encryption key when executors register.

2016-11-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 45e2b3c0e -> c4cbdc864 [SPARK-18547][CORE] Propagate I/O encryption key when executors register. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation

spark git commit: [SPARK-18547][CORE] Propagate I/O encryption key when executors register.

2016-11-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1633ff3b6 -> 8b325b17e [SPARK-18547][CORE] Propagate I/O encryption key when executors register. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation

spark git commit: [SPARK-18588][SS][KAFKA] Ignore the flaky kafka test

2016-11-28 Thread zsxwing
ins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16051 from zsxwing/ignore-flaky-kafka-test. (cherry picked from commit 1633ff3b6c97e33191859f34c868782cbb0972fd) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-18588][SS][KAFKA] Ignore the flaky kafka test

2016-11-28 Thread zsxwing
ins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16051 from zsxwing/ignore-flaky-kafka-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1633ff3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1633

spark git commit: [SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to TimestampType

2016-11-22 Thread zsxwing
tch tested? `test("Kafka column types")`. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15969 from zsxwing/SPARK-18530. (cherry picked from commit d0212eb0f22473ee5482fe98dafc24e16ffcfc63) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

<    1   2   3   4   5   6   7   8   >