spark git commit: [SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to TimestampType

2016-11-22 Thread zsxwing
ted? `test("Kafka column types")`. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15969 from zsxwing/SPARK-18530. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d0212eb0 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6dbe44891 -> aaa2a173a [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly ## What changes were proposed in this pull request? Right now we are testing the most of `CompactibleFileStreamLog` in

spark git commit: [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 97a8239a6 -> ebeb0830a [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly ## What changes were proposed in this pull request? Right now we are testing the most of `CompactibleFileStreamLog` in

spark git commit: [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a2d464770 -> 97a8239a6 [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe ## What changes were proposed in this pull request? This PR adds two of the newly added methods of `Dataset`s to Python:

spark git commit: [SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting.

2016-11-18 Thread zsxwing
tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFileStreamLog APIs. zs

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0)

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 10b36d62a -> 37e6d9930 [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0) This is a fix for branch-2.0 for the earlier PR #15895 ## What

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 c0dbe08d6 -> b86e962c9 [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus ## What changes were proposed in this pull request? SPARK-18459: triggerId

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 608ecc512 -> 0048ce7ce [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus ## What changes were proposed in this pull request? SPARK-18459: triggerId seems

spark git commit: [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3ce057d00 -> 4b35d13ba [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation ## What changes were proposed in this pull request? Commit https://github.com/apache/spark/commit/f14ae4900ad0ed66ba36108b7792d56cd6767a69 broke the

spark git commit: [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 1126c3194 -> 175c47864 [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation ## What changes were proposed in this pull request? Commit https://github.com/apache/spark/commit/f14ae4900ad0ed66ba36108b7792d56cd6767a69 broke

spark git commit: [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 b424dc947 -> e469d3bad [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started ## What changes were proposed in this pull request? Several tests are being failed on Windows due to

spark git commit: [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1ae4652b7 -> 503378f10 [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started ## What changes were proposed in this pull request? Several tests are being failed on Windows due to the

spark git commit: [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 0af94e772 -> 5f7a9af66 [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey Added RDD batch time as an input parameter to the update function in updateStateByKey. Author: Aaditya Ramesh

spark git commit: [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 745ab8bc5 -> 6f9e598cc [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey Added RDD batch time as an input parameter to the update function in updateStateByKey. Author: Aaditya Ramesh

spark git commit: [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bdfe60ac9 -> 89d1fa58d [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis. ## How was this patch

spark git commit: [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 3c623d226 -> db691f05c [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis. ## How was this patch

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 666396510 -> c40fcbc14 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 c07fe1c59 -> 3c623d226 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9d07ceee7 -> bdfe60ac9 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store is

spark git commit: [SPARK-17829][SQL] Stable format for offset log

2016-11-09 Thread zsxwing
ing/OffsetSuite.scala) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. zsxwing marmbrus Author: Tyson Condie <tcon...@gmail.com> Author: Tyson Condie <tcondie@clash.local> Closes #15626 from tcondie/spark-8360. (cherry pic

spark git commit: [SPARK-17829][SQL] Stable format for offset log

2016-11-09 Thread zsxwing
ing/OffsetSuite.scala) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. zsxwing marmbrus Author: Tyson Condie <tcon...@gmail.com> Author: Tyson Condie <tcondie@clash.local> Closes #15626 from tcondie/spark-8360. Project:

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6de

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b5d7217af -> 10525c294 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master daa975f4b -> b06c23db9 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset in

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6b332909f -> 7a84edb24 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-02 Thread zsxwing
All() in the original code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tes

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-02 Thread zsxwing
All() in the original code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tes

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-02 Thread zsxwing
nal code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tes

spark git commit: [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files

2016-10-31 Thread zsxwing
uld not delete files because the source maybe is listing files. This PR just removes the delete actions since they are not necessary. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15699 from zsxwing/SPARK-18030. Project: http://git-wip-us.a

spark git commit: [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files

2016-10-31 Thread zsxwing
ata` should not delete files because the source maybe is listing files. This PR just removes the delete actions since they are not necessary. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15699 from zsxwing/SPARK-18030. (cherry picked fr

spark git commit: [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception

2016-10-28 Thread zsxwing
How was this patch tested? The fixed unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15674 from zsxwing/foreach-sink-error. (cherry picked from commit 59cccbda489f25add3e10997e950de7e88704aa7) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception

2016-10-28 Thread zsxwing
How was this patch tested? The fixed unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15674 from zsxwing/foreach-sink-error. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59cccbda Tree: http:

spark git commit: [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection"

2016-10-27 Thread zsxwing
he file list API doesn't guarantee any order of the return list. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15661 from zsxwing/fix-StreamingQuerySuite. (cherry picked from commit 79fd0cc0584e48fb021c4237877b15abbffb319a) Signed-off-by:

spark git commit: [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection"

2016-10-27 Thread zsxwing
ile list API doesn't guarantee any order of the return list. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15661 from zsxwing/fix-StreamingQuerySuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-17813][SQL][KAFKA] Maximum data per trigger

2016-10-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 1a4be51d6 -> 6fb1f735f [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions.

spark git commit: [SPARK-17813][SQL][KAFKA] Maximum data per trigger

2016-10-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 701a9d361 -> 104232580 [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions. ##

spark git commit: [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 72b3cff33 -> ea205e376 [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes ## What changes were proposed in this pull request? This PR contains changes to the Source trait such that the scheduler

spark git commit: [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a76846cfb -> 5b27598ff [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes ## What changes were proposed in this pull request? This PR contains changes to the Source trait such that the scheduler can

spark git commit: [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0)

2016-10-26 Thread zsxwing
hor: Shixiong Zhu <shixi...@databricks.com> Closes #15646 from zsxwing/SPARK-13747-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76b71eef Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76b71eef Diff: http:

spark git commit: [SPARK-18104][DOC] Don't build KafkaSource doc

2016-10-26 Thread zsxwing
All KafkaSource APIs are internal. ## How was this patch tested? Verified manually. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15630 from zsxwing/kafka-unidoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit

spark git commit: [SPARK-18104][DOC] Don't build KafkaSource doc

2016-10-26 Thread zsxwing
rce. All KafkaSource APIs are internal. ## How was this patch tested? Verified manually. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15630 from zsxwing/kafka-unidoc. (cherry picked from commit 7d10631c16b980adf1f55378c128436310daed65) Signed-off-by: Shixiong Zhu <shixi...@da

spark git commit: [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL

2016-10-26 Thread zsxwing
ted? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15520 from zsxwing/SPARK-13747. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7ac70e7b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7ac7

spark git commit: [SPARK-16304] LinkageError should not crash Spark executor

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b4a7b6551 -> 773fbfef1 [SPARK-16304] LinkageError should not crash Spark executor ## What changes were proposed in this pull request? This patch updates the failure handling logic so Spark executor does not crash when seeing

spark git commit: [SPARK-17894][HOTFIX] Fix broken build from

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d479c5262 -> 483c37c58 [SPARK-17894][HOTFIX] Fix broken build from The named parameter in an overridden class isn't supported in Scala 2.10 so was breaking the build. cc zsxwing Author: Kay Ousterhout <kayousterh...@gmail.com>

spark git commit: [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 81d6933e7 -> 407c3cedf [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance ## What changes were proposed in this pull request? The reason for the flakiness was follows. The test starts the maintenance background

spark git commit: [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 bad15bcdf -> 1c1e847bc [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance ## What changes were proposed in this pull request? The reason for the flakiness was follows. The test starts the maintenance

spark git commit: [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 aef65ac02 -> bad15bcdf [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch ## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with

spark git commit: [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 064db176d -> aef65ac02 [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing ## What changes were proposed in this pull request? When reading file stream with non-globbing path, the

spark git commit: [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 3e9840f1d -> d3c78c4f3 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger spark batches. ## How was

spark git commit: [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 268ccb9a4 -> c9720b219 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger spark batches. ## How was this

spark git commit: [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b113b5d9f -> 3e9840f1d [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition offsets as a json

spark git commit: [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 140570252 -> 268ccb9a4 [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition offsets as a json argument,

spark git commit: [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c1f344f1a -> 140570252 [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch ## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with specified

spark git commit: [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7a531e305 -> c1f344f1a [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-17929 Now `CoarseGrainedSchedulerBackend`

spark git commit: [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 af2e6e0c9 -> b113b5d9f [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-17929 Now

spark git commit: [SPARK-18030][TESTS] Adds more checks to collect more info about FileStreamSourceSuite failure

2016-10-20 Thread zsxwing
ore info. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15577 from zsxwing/SPARK-18030-debug. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1bb99c48 Tree: http:

spark git commit: [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD

2016-10-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 4131623a8 -> e8923d21d [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD ## What changes were proposed in this pull request? The newly implemented Structured Streaming `KafkaSource` did calculate the preferred

spark git commit: [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD

2016-10-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 84b245f2d -> 947f4f252 [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD ## What changes were proposed in this pull request? The newly implemented Structured Streaming `KafkaSource` did calculate the preferred

spark git commit: [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 5f20ae039 -> 2629cd746 [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error ## What changes were proposed in this pull request? Fix hadoop2.2 compilation error. ## How was this patch tested? Existing tests. cc tdas zsxw

spark git commit: [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 99943bf69 -> 3796a98cf [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error ## What changes were proposed in this pull request? Fix hadoop2.2 compilation error. ## How was this patch tested? Existing tests. cc tdas zsxw

spark git commit: [SPARK-17930][CORE] The SerializerInstance instance used when deserializing a TaskResult is not reused

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 20dd11096 -> 4518642ab [SPARK-17930][CORE] The SerializerInstance instance used when deserializing a TaskResult is not reused ## What changes were proposed in this pull request? The following code is called when the DirectTaskResult

spark git commit: [SPARK-17839][CORE] Use Nio's directbuffer instead of BufferedInputStream in order to avoid additional copy from os buffer cache to user buffer

2016-10-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e3bf37fa3 -> c7ac027d5 [SPARK-17839][CORE] Use Nio's directbuffer instead of BufferedInputStream in order to avoid additional copy from os buffer cache to user buffer ## What changes were proposed in this pull request? Currently we use

spark git commit: [SPARK-17678][REPL][BRANCH-1.6] Honor spark.replClassServer.port in scala-2.11 repl

2016-10-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 585c5657f -> 18b173cfc [SPARK-17678][REPL][BRANCH-1.6] Honor spark.replClassServer.port in scala-2.11 repl ## What changes were proposed in this pull request? Spark 1.6 Scala-2.11 repl doesn't honor "spark.replClassServer.port"

spark git commit: [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-13 Thread zsxwing
ate the partition offsets, this PR just calls `seekToBeginning` to manually set the earliest offsets for the KafkaSource initial offsets. ## How was this patch tested? Existing tests. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15397 from zsxwing/SPARK-17834. (cherry picked fr

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 5903dabc5 -> ab00e410c [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 21cb59f1c -> edeb51a39 [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log in

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 d55ba3063 -> 050b8177e [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author:

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9ce7d3e54 -> f9a56a153 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author:

spark git commit: [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator

2016-10-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 a6b5e1dcc -> 5ec3e6680 [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator ## What changes were proposed in this pull request? Replaced `BlockStatusesAccumulator` with

spark git commit: [SPARK-17816][CORE] Fix ConcurrentModificationException issue in BlockStatusesAccumulator

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 90217f9de -> 19a5bae47 [SPARK-17816][CORE] Fix ConcurrentModificationException issue in BlockStatusesAccumulator ## What changes were proposed in this pull request? Change the BlockStatusesAccumulator to return immutable object when value

spark git commit: [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 d719e9a08 -> ff9f5bbf1 [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite ## What changes were proposed in this pull request? The default buffer size is not big enough for randomly generated MapType. ## How was this patch tested?

spark git commit: [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 03c40202f -> d5ec4a3e0 [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite ## What changes were proposed in this pull request? The default buffer size is not big enough for randomly generated MapType. ## How was this patch tested? Ran

spark git commit: [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 f460a199e -> a84d8ef37 [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules ## What changes were proposed in this pull request? This PR adds the Kafka 0.10 subproject to the build infrastructure. This makes sure

[2/2] spark git commit: [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)

2016-10-07 Thread zsxwing
/b678e465afa417780b54db0fbbaa311621311f15 into branch 2.0. The only difference is the Spark version in pom file. ## How was this patch tested? Jenkins. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15367 from zsxwing/kafka-source-branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spar

[1/2] spark git commit: [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 9f2eb27a4 -> f460a199e http://git-wip-us.apache.org/repos/asf/spark/blob/f460a199/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

spark git commit: [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 3487b0203 -> 9f2eb27a4 [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes

spark git commit: [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master dd16b52cf -> cff560755 [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished ## What changes were proposed in this pull request? This expands calls to Jetty's simple `ServerConnector` constructor to explicitly

spark git commit: [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming

2016-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 1c2dff1ee -> 225372adf [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming ## What changes were proposed in this pull request? I was looking through API annotations to catch mislabeled APIs, and realized

spark git commit: [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming

2016-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 92b7e5728 -> 79accf45a [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming ## What changes were proposed in this pull request? I was looking through API annotations to catch mislabeled APIs, and realized

spark git commit: [SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0)

2016-10-05 Thread zsxwing
mit/988c71457354b0a443471f501cef544a85b1a76a to branch-2.0 ## How was this patch tested? Jenkins Author: Michael Armbrust <mich...@databricks.com> Closes #15362 from zsxwing/SPARK-17643-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c2d

spark git commit: [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite

2016-10-05 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15350 from zsxwing/SPARK-17778. (cherry picked from commit 221b418b1c9db7b04c600b6300d18b034a4f444e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite

2016-10-05 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15350 from zsxwing/SPARK-17778. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/221b418b Tree: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-17696][SPARK-12330][CORE] Partial backport of to branch-1.6.

2016-09-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 e2ce0caed -> b999fa43e [SPARK-17696][SPARK-12330][CORE] Partial backport of to branch-1.6. >From the original commit message: This PR also fixes a regression caused by [SPARK-10987] whereby submitting a shutdown causes a race between

spark git commit: [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch failure

2016-09-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 4d73d5cd8 -> 4c694e452 [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch failure | Time|Thread 1 , Job1 | Thread 2 , Job2 | |:-:|:-:|:-:| | 1 | abort stage due to

spark git commit: [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch failure

2016-09-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 219003775 -> 46d1203bf [SPARK-17644][CORE] Do not add failedStages when abortStage for fetch failure ## What changes were proposed in this pull request? | Time|Thread 1 , Job1 | Thread 2 , Job2 |

spark git commit: [SPARK-17649][CORE] Log how many Spark events got dropped in AsynchronousListenerBus (branch 1.6)

2016-09-26 Thread zsxwing
ted? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15226 from zsxwing/SPARK-17649-branch-1.6. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7aded55e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus

2016-09-26 Thread zsxwing
get insights on how to set a correct event queue size. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15220 from zsxwing/SPARK-17649. (cherry picked from commit bde85f8b70138a51052b613664facbc981378c38) Signed-off-by: Shixiong Z

spark git commit: [SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus

2016-09-26 Thread zsxwing
get insights on how to set a correct event queue size. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15220 from zsxwing/SPARK-17649. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-17650] malformed url's throw exceptions before bricking Executors

2016-09-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master de333d121 -> 59d87d240 [SPARK-17650] malformed url's throw exceptions before bricking Executors ## What changes were proposed in this pull request? When a malformed URL was sent to Executors through `sc.addJar` and `sc.addFile`, the

spark git commit: [SPARK-17650] malformed url's throw exceptions before bricking Executors

2016-09-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 ed545763a -> 88ba2e1d0 [SPARK-17650] malformed url's throw exceptions before bricking Executors ## What changes were proposed in this pull request? When a malformed URL was sent to Executors through `sc.addJar` and `sc.addFile`, the

spark git commit: [SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue size configurable (branch 2.0)

2016-09-23 Thread zsxwing
ted? Jenkins. Author: Dhruve Ashar <dhruveas...@gmail.com> Closes #15222 from zsxwing/SPARK-15703-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e91a100 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e91

spark git commit: [SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStreamSource.FileEntry

2016-09-23 Thread zsxwing
try so that we can make sure not writing any FileEntry(..., batchId = -1) into the log. This also avoids people misusing it in future (#15203 is an example). ## How was this patch tested? Jenkins. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15206 from zsxwing/cleanup. (cher

spark git commit: [SPARK-17640][SQL] Avoid using -1 as the default batchId for FileStreamSource.FileEntry

2016-09-23 Thread zsxwing
so that we can make sure not writing any FileEntry(..., batchId = -1) into the log. This also avoids people misusing it in future (#15203 is an example). ## How was this patch tested? Jenkins. Author: Shixiong Zhu <shixi...@databricks.com> Closes #15206 from zsxwing/cleanup. Project: h

spark git commit: [SPARK-17569][SPARK-17569][TEST] Make the unit test added for work again

2016-09-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f4f6bd8c9 -> a16619683 [SPARK-17569][SPARK-17569][TEST] Make the unit test added for work again ## What changes were proposed in this pull request? A [PR](https://github.com/apache/spark/commit/a6aade0042d9c065669f46d2dac40ec6ce361e63)

spark git commit: [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process is dead

2016-09-22 Thread zsxwing
ing. Hence we will see a lot of Py4jException before the JVM process exits. It's better to stop the JVM StreamingContext to avoid those annoying logs. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15201 from zsxwing/stop-jvm-ssc. (cherry pi

spark git commit: [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process is dead

2016-09-22 Thread zsxwing
ing. Hence we will see a lot of Py4jException before the JVM process exits. It's better to stop the JVM StreamingContext to avoid those annoying logs. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #15201 from zsxwing/stop-jvm-ssc. Project: http:

spark git commit: [SPARK-17569] Make StructuredStreaming FileStreamSource batch generation faster

2016-09-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 8c3ee2bc4 -> 7cbe21644 [SPARK-17569] Make StructuredStreaming FileStreamSource batch generation faster ## What changes were proposed in this pull request? While getting the batch for a `FileStreamSource` in StructuredStreaming, we know

spark git commit: [SPARK-4563][CORE] Allow driver to advertise a different network address.

2016-09-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master b4a4421b6 -> 2cd1bfa4f [SPARK-4563][CORE] Allow driver to advertise a different network address. The goal of this feature is to allow the Spark driver to run in an isolated environment, such as a docker container, and be able to use the

<    1   2   3   4   5   6   7   8   >