[spark] branch branch-3.0 updated: [MINOR][SQL] Update the DataFrameWriter.bucketBy comment

dongjoon Tue, 17 Mar 2020 00:55:22 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 26ea213  [MINOR][SQL] Update the DataFrameWriter.bucketBy comment
26ea213 is described below

commit 26ea213f3c4f2acb07045bf0f6b476ddfb635436
Author: Takeshi Yamamuro <[email protected]>
AuthorDate: Tue Mar 17 00:52:45 2020 -0700

    [MINOR][SQL] Update the DataFrameWriter.bucketBy comment
    
    ### What changes were proposed in this pull request?
    
    This PR intends to update the `DataFrameWriter.bucketBy` comment for 
clearly describing that the bucketBy scheme follows a Spark "specific" one.
    
    I saw the questions about the current bucketing compatibility with Hive in 
[SPARK-31162](https://issues.apache.org/jira/browse/SPARK-31162?focusedCommentId=17060408&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17060408)
 and 
[SPARK-17495](https://issues.apache.org/jira/browse/SPARK-17495?focusedCommentId=17059847&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17059847)
 from users and IMHO the comment is a bit confusing [...]
    
    ### Why are the changes needed?
    
    To make users understood smoothly.
    
    ### Does this PR introduce any user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    N/A
    
    Closes #27930 from maropu/UpdateBucketByComment.
    
    Authored-by: Takeshi Yamamuro <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 124b4ce2e6e8f84294f8fc13d3e731a82325dacb)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 22b26ca..6946c1f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -198,7 +198,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
 
   /**
    * Buckets the output by the given columns. If specified, the output is laid 
out on the file
-   * system similar to Hive's bucketing scheme.
+   * system similar to Hive's bucketing scheme, but with a different bucket 
hash function
+   * and is not compatible with Hive's bucketing.
    *
    * This is applicable for all file-based data sources (e.g. Parquet, JSON) 
starting with Spark
    * 2.1.0.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [MINOR][SQL] Update the DataFrameWriter.bucketBy comment

Reply via email to