[jira] [Resolved] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-48138.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46396
[https://github.com/apache/spark/pull/46396]

> Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
> -
>
> Key: SPARK-48138
> URL: https://issues.apache.org/jira/browse/SPARK-48138
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
> (Master, 5/5)
> - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
> (Master, 5/4)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-48138:


Assignee: Dongjoon Hyun

> Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
> -
>
> Key: SPARK-48138
> URL: https://issues.apache.org/jira/browse/SPARK-48138
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
> (Master, 5/5)
> - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
> (Master, 5/4)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843620#comment-17843620
 ] 

Sandeep Katta commented on SPARK-35531:
---

Bug is tracked here https://issues.apache.org/jira/browse/SPARK-48140 

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48140) Can not alter bucketed table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-48140:
--
Description: 
Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');

*Exception:*
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 
is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
comment:null), FieldSchema(name:s1, type:int, comment:null)]
        at 
org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
        at 
org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)
        at 
org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
        ... 62 more
{code}

  was:
Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');


> Can not alter bucketed table if create table with upper case schema
> ---
>
> Key: SPARK-48140
> URL: https://issues.apache.org/jira/browse/SPARK-48140
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Sandeep Katta
>Priority: Major
>
> Running below SQL command throws exception
>  
> CREATE TABLE TEST1(
> V1 BIGINT,
> S1 INT)
> PARTITIONED BY (PK BIGINT)
> CLUSTERED BY (V1)
> SORTED BY (S1)
> INTO 200 BUCKETS
> STORED AS PARQUET;
> ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');
> *Exception:*
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns 
> V1 is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
> comment:null), FieldSchema(name:s1, type:int, comment:null)]
>         at 
> org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
>         at 
> org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)
>         at 
> org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)
>         at 
> org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
>         at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>         ... 62 more
> {code}



--
This message was sent by Atlassian Jira

[jira] [Created] (SPARK-48140) Can not alter bucketed table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)
Sandeep Katta created SPARK-48140:
-

 Summary: Can not alter bucketed table if create table with upper 
case schema
 Key: SPARK-48140
 URL: https://issues.apache.org/jira/browse/SPARK-48140
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Sandeep Katta


Running below SQL command throws exception

 
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.');



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2024-05-05 Thread Sandeep Katta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843619#comment-17843619
 ] 

Sandeep Katta commented on SPARK-35531:
---

[~angerszhuuu] , I do see same issue in alter table command, I tested in 
SPARK-3.5.0 and issue still exists
{code:java}
CREATE TABLE TEST1(
V1 BIGINT,
S1 INT)
PARTITIONED BY (PK BIGINT)
CLUSTERED BY (V1)
SORTED BY (S1)
INTO 200 BUCKETS
STORED AS PARQUET;

ALTER TABLE test1 SET TBLPROPERTIES ('comment' = 'This is a new comment.'); 
{code}
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 
is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
comment:null), FieldSchema(name:s1, type:int, comment:null)]         at 
org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)         
at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1145)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:594)
         at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)         
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
         at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:587)
         at 
org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:124)    
     at 
org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:123)   
      at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:93)
         at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:687)
         at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)         
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
         ... 62 more
{code}

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0, 3.1.4
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48138:
--
Description: 
- https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
(Master, 5/5)
- https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
(Master, 5/4)

> Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
> -
>
> Key: SPARK-48138
> URL: https://issues.apache.org/jira/browse/SPARK-48138
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
> (Master, 5/5)
> - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
> (Master, 5/4)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`

2024-05-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48139:
--
Description: (was: - 
https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
(Master, 5/5)
- https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
(Master, 5/4))

> Re-enable `SparkSessionE2ESuite.interrupt tag`
> --
>
> Key: SPARK-48139
> URL: https://issues.apache.org/jira/browse/SPARK-48139
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`

2024-05-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48139:
--
Description: 
- https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
(Master, 5/5)
- https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
(Master, 5/4)

> Re-enable `SparkSessionE2ESuite.interrupt tag`
> --
>
> Key: SPARK-48139
> URL: https://issues.apache.org/jira/browse/SPARK-48139
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 
> (Master, 5/5)
> - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 
> (Master, 5/4)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48138:
---
Labels: pull-request-available  (was: )

> Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
> -
>
> Key: SPARK-48138
> URL: https://issues.apache.org/jira/browse/SPARK-48138
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test

2024-05-05 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48138:
-

 Summary: Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
 Key: SPARK-48138
 URL: https://issues.apache.org/jira/browse/SPARK-48138
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48136) Always upload Spark Connect log files

2024-05-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48136.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46393
[https://github.com/apache/spark/pull/46393]

> Always upload Spark Connect log files
> -
>
> Key: SPARK-48136
> URL: https://issues.apache.org/jira/browse/SPARK-48136
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should always upload log files if it is not success



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-4:


Assignee: (was: Chaoqin Li)

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-4:
--

Reverted at 
https://github.com/apache/spark/commit/4e69857195a6f95c22f962e3eed950876036c04f

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source

2024-05-05 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-4:
-
Fix Version/s: (was: 4.0.0)

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48135) Run `buf` and `ui` only in PR builders and Java 21 Daily CI

2024-05-05 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48135:
--
Summary: Run `buf` and `ui` only in PR builders and Java 21 Daily CI  (was: 
Run `but` and `ui` only in PR builders and Java 21 Daily CI)

> Run `buf` and `ui` only in PR builders and Java 21 Daily CI
> ---
>
> Key: SPARK-48135
> URL: https://issues.apache.org/jira/browse/SPARK-48135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48073) StateStore schema incompatibility between 3.2 and 3.4

2024-05-05 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843583#comment-17843583
 ] 

L. C. Hsieh commented on SPARK-48073:
-

The breaking change was introduced by https://github.com/apache/spark/pull/39615

> StateStore schema incompatibility between 3.2 and 3.4
> -
>
> Key: SPARK-48073
> URL: https://issues.apache.org/jira/browse/SPARK-48073
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.4
>Reporter: L. C. Hsieh
>Priority: Major
>
> One our customer encountered some schema incompatibility problems when 
> upgrading from Spark 3.2 to 3.4 with structured streaming application.
> It seems in 3.4 `Encoders.bean()` includes properties with only getter with 
> or without setter, whereas in 3.2, only properties with both getter and 
> setter are included.
> For example, here are schemas for an AtomicLong property/field generated by 
> each version:
> 3.2: 
> StructType(StructField(opaque,LongType,true),StructField(plain,LongType,true))
> 3.4: 
> StructType(StructField(acquire,LongType,false),StructField(andDecrement,LongType,false),StructField(andIncrement,LongType,false),StructField(opaque,LongType,false),StructField(plain,LongType,false))
> Note that the null ability flag also changes.
> Primitive long schema has nullable=true in 3.2, but false in 3.4.
> I am not sure if the issue is aware by the community before, and if there is 
> workaround for that?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47353) Mode (all collations)

2024-05-05 Thread Gideon P (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843574#comment-17843574
 ] 

Gideon P edited comment on SPARK-47353 at 5/5/24 6:30 PM:
--

[~uros-db] Mode uses an accumulating OpenHashMap to determine the count of each 
unique element. 

Currently, the Apache Spark Mode function uses OpenHashMap to track occurrences 
of each key. However, with collation ordering (where multiple keys might 
compare as equal), using a direct hash map will not work effectively since 
different keys will need to be treated as the same. 

A few approaches to handle collations come to mind
1. Modify implementation `Mode.eval` to combine the map further. Perhaps by 
turning the map into a list of key-value tuples and folding. If the last 
element of the accumulating list and the current element being folded are equal 
according to collation, combine their counts 
2. Another way to modify implementation `Mode.eval` to combine the map further 
would be to add all the elements of the buffer to a TreeMap with Comparator. A 
TreeMap can efficiently keep track of values and their counts in a sorted 
manner using a collation-sensitive comparator.  
3. Use a TreeMap instead of OpenHashMap during the accumulation stage. Create a 
trait similar to TypedAggregateWithHashMapAsBuffer. Switch to use of this 
whenever both datatype of column is StringType and we are using a session 
collation. Would implement TypedImperativeAggregate. 
4. Potentially using codegen fallback in this case would work. 

To start, I will try approach number 2.

Please let me know if I am on the right track and if you have any ideas! 


was (Author: JIRAUSER304403):
[~uros-db] Mode uses an accumulating OpenHashMap to determine the count of each 
unique element. 

Currently, the Apache Spark Mode function uses OpenHashMap to track occurrences 
of each key. However, with collation ordering (where multiple keys might 
compare as equal), using a direct hash map will not work effectively since 
different keys will need to be treated as the same. 

A few approaches to handle collations come to mind
1. Modify implementation `Mode.eval` to combine the map further. Perhaps by 
turning the map into a list of key-value tuples and folding. If the last 
element of the accumulating list and the current element being folded are equal 
according to collation, combine their counts 
2. Another way to modify implementation `Mode.eval` to combine the map further 
would be to add all the elements of the buffer to a TreeMap with Comparator. A 
TreeMap can efficiently keep track of values and their counts in a sorted 
manner using a collation-sensitive comparator.  
3. Use a TreeMap instead of OpenHashMap during the accumulation stage. Create a 
trait similar to TypedAggregateWithHashMapAsBuffer. Switch to use of this 
whenever both datatype of column is StringType and we are using a session 
collation. Would implement TypedImperativeAggregate. 

To start, I will try approach number 2.

Please let me know if I am on the right track and if you have any ideas! 

> Mode (all collations)
> -
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *Mode* expression in Spark. First confirm 
> what is the expected behaviour for this expression when given collated 
> strings, then move on to the implementation that would enable handling 
> strings of all collation types. Implement the corresponding unit tests and 
> E2E SQL tests to reflect how this function should be used with collation in 
> SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Mode* expression so it 
> supports all collation types currently supported in Spark. To understand what 
> changes were introduced in order to enable full collation support for other 
> existing functions in Spark, take a look at the Spark PRs and Jira tickets 
> for completed tasks in this parent (for example: Contains, StartsWith, 
> EndsWith).
> Examples:
> With UTF8_BINARY collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) 
> AS tab(col);
> should return 'a'.
> With UTF8_BINARY_LCASE collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) 
> AS tab(col);
> should return either 'B' or 'b'.
>  
> Read more about 

[jira] [Commented] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False

2024-05-05 Thread Saidatt Sinai Amonkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843576#comment-17843576
 ] 

Saidatt Sinai Amonkar commented on SPARK-48045:
---

Opened a pull request to fix this: [GitHub Pull Request 
#46391|https://github.com/apache/spark/pull/46391]

> Pandas API groupby with multi-agg-relabel ignores as_index=False
> 
>
> Key: SPARK-48045
> URL: https://issues.apache.org/jira/browse/SPARK-48045
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.5.1
> Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2
>Reporter: Paul George
>Priority: Minor
>  Labels: pull-request-available
>
> A Pandas API DataFrame groupby with as_index=False and a multilevel 
> relabeling, such as
> {code:java}
> from pyspark import pandas as ps
> ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", 
> as_index=False).agg(b_max=("b", "max")){code}
> fails to include group keys in the resulting DataFrame. This diverges from 
> expected behavior as well as from the behavior of native Pandas, e.g.
> *actual*
> {code:java}
>    b_max
> 0      1 {code}
> *expected*
> {code:java}
>    a  b_max
> 0  0      1 {code}
>  
> A possible fix is to prepend groupby key columns to {{*order*}} and 
> {{*columns*}} before filtering here:  
> [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48045:
---
Labels: pull-request-available  (was: )

> Pandas API groupby with multi-agg-relabel ignores as_index=False
> 
>
> Key: SPARK-48045
> URL: https://issues.apache.org/jira/browse/SPARK-48045
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.5.1
> Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2
>Reporter: Paul George
>Priority: Minor
>  Labels: pull-request-available
>
> A Pandas API DataFrame groupby with as_index=False and a multilevel 
> relabeling, such as
> {code:java}
> from pyspark import pandas as ps
> ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", 
> as_index=False).agg(b_max=("b", "max")){code}
> fails to include group keys in the resulting DataFrame. This diverges from 
> expected behavior as well as from the behavior of native Pandas, e.g.
> *actual*
> {code:java}
>    b_max
> 0      1 {code}
> *expected*
> {code:java}
>    a  b_max
> 0  0      1 {code}
>  
> A possible fix is to prepend groupby key columns to {{*order*}} and 
> {{*columns*}} before filtering here:  
> [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47353) Mode (all collations)

2024-05-05 Thread Gideon P (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843574#comment-17843574
 ] 

Gideon P commented on SPARK-47353:
--

[~uros-db] Mode uses an accumulating OpenHashMap to determine the count of each 
unique element. 

Currently, the Apache Spark Mode function uses OpenHashMap to track occurrences 
of each key. However, with collation ordering (where multiple keys might 
compare as equal), using a direct hash map will not work effectively since 
different keys will need to be treated as the same. 

A few approaches to handle collations come to mind
1. Modify implementation `Mode.eval` to combine the map further. Perhaps by 
turning the map into a list of key-value tuples and folding. If the last 
element of the accumulating list and the current element being folded are equal 
according to collation, combine their counts 
2. Another way to modify implementation `Mode.eval` to combine the map further 
would be to add all the elements of the buffer to a TreeMap with Comparator. A 
TreeMap can efficiently keep track of values and their counts in a sorted 
manner using a collation-sensitive comparator.  
3. Use a TreeMap instead of OpenHashMap during the accumulation stage. Create a 
trait similar to TypedAggregateWithHashMapAsBuffer. Switch to use of this 
whenever both datatype of column is StringType and we are using a session 
collation. Would implement TypedImperativeAggregate. 

To start, I will try approach number 2.

Please let me know if I am on the right track and if you have any ideas! 

> Mode (all collations)
> -
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *Mode* expression in Spark. First confirm 
> what is the expected behaviour for this expression when given collated 
> strings, then move on to the implementation that would enable handling 
> strings of all collation types. Implement the corresponding unit tests and 
> E2E SQL tests to reflect how this function should be used with collation in 
> SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Mode* expression so it 
> supports all collation types currently supported in Spark. To understand what 
> changes were introduced in order to enable full collation support for other 
> existing functions in Spark, take a look at the Spark PRs and Jira tickets 
> for completed tasks in this parent (for example: Contains, StartsWith, 
> EndsWith).
> Examples:
> With UTF8_BINARY collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) 
> AS tab(col);
> should return 'a'.
> With UTF8_BINARY_LCASE collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) 
> AS tab(col);
> should return either 'B' or 'b'.
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48134) Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48134:
---
Labels: pull-request-available  (was: )

> Spark core (java side): Migrate `error/warn/info` with variables to 
> structured logging framework
> 
>
> Key: SPARK-48134
> URL: https://issues.apache.org/jira/browse/SPARK-48134
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48134) Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-05 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-48134:
---

 Summary: Spark core (java side): Migrate `error/warn/info` with 
variables to structured logging framework
 Key: SPARK-48134
 URL: https://issues.apache.org/jira/browse/SPARK-48134
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org