[carbondata] branch master updated: [CARBONDATA-4325] Update Data frame supported options in document and fix partition table creation with df spatial property

indhumuthumurugesh Fri, 04 Mar 2022 00:01:05 -0800

This is an automated email from the ASF dual-hosted git repository.

indhumuthumurugesh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new c840b5f  [CARBONDATA-4325] Update Data frame supported options in 
document and fix partition table creation with df spatial property
c840b5f is described below

commit c840b5f30b15df54778b2a83608c727d25553d7c
Author: ShreelekhyaG <[email protected]>
AuthorDate: Mon Feb 28 14:57:34 2022 +0530

    [CARBONDATA-4325] Update Data frame supported options in document and fix 
partition table creation with df spatial property
    
    Why is this PR needed?
    1. Only specific properties are supported using dataframe options. Need to 
update the documentation.
    2. Create partition table fails with Spatial index property for carbon 
table created with dataframe in spark-shell.
    
    What changes were proposed in this PR?
    1. Added data frame supported properties in the documentation.
    2. Using spark-shell, the table gets created with carbon session and 
catalogTable.properties
    is empty here. Getting the properties from catalogTable.storage.properties 
to access the properties set.
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No, tested in cluster.
    
    This closes #4250
---
 docs/carbon-as-spark-datasource-guide.md               | 18 ++++++++++++++++++
 .../execution/command/management/CommonLoadUtils.scala |  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/docs/carbon-as-spark-datasource-guide.md 
b/docs/carbon-as-spark-datasource-guide.md
index 275d5b1..e578ed0 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -96,6 +96,24 @@ df.write.format("carbon").save("/user/person_table")
 val dfread = spark.read.format("carbon").load("/user/person_table")
 dfread.show()
 ```
+## Supported OPTIONS using dataframe
+
+In addition to the above [Supported Options](#supported-options), following 
properties are supported using dataframe.
+
+| Property                          | Default Value                            
 | Description                                                                  
                                                                                
                                                                                
                                                                                
                                                     |
+|-----------------------------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| bucket_number                     | NA                                       
 | Number of buckets to be created. For more details, see 
[Bucketing](./ddl-of-carbondata.md#bucketing).                                  
                                                                                
                                                                                
                                                                           |
+| bucket_columns                    | NA                                       
 | Columns which are to be placed in buckets. For more details, see 
[Bucketing](./ddl-of-carbondata.md#bucketing).                                  
                                                                                
                                                                                
                                                                 |
+| streaming                         | false                                    
 | Whether the table is a streaming table. For more details, see 
[Streaming](./ddl-of-carbondata.md#streaming).                                  
                                                                                
                                                                                
                                                                    |
+| timestampformat                   | yyyy-MM-dd HH:mm:ss                      
 | For specifying the format of TIMESTAMP data type column. For more details, 
see [TimestampFormat](./ddl-of-carbondata.md#dateformattimestampformat).        
                                                                                
                                                                                
                                                       |
+| dateformat                        | yyyy-MM-dd                               
 | For specifying the format of DATE data type column. For more details, see 
[DateFormat](./ddl-of-carbondata.md#dateformattimestampformat).                 
                                                                                
                                                                                
                                                        |
+| SPATIAL_INDEX                     | NA                                       
 | Used to configure Spatial Index name. This name is appended to 
`SPATIAL_INDEX` in the subsequent sub-property configurations. `xxx` in the 
below sub-properties refer to index name. Generated spatial index column is not 
allowed in any properties except in `SORT_COLUMNS` table property.For more 
details, see [Spatial Index](./spatial-index-guide).                        |
+| SPATIAL_INDEX.xxx.type            | NA                                       
 | Type of algorithm for processing spatial data. Currently, supports 'geohash' 
and 'geosot'.                                                                   
                                                                                
                                                                                
                                                     |
+| SPATIAL_INDEX.xxx.sourcecolumns   | NA                                       
 | longitude and latitude column names as in the table. These columns are used 
to generate index value for each row.                                           
                                                                                
                                                                                
                                                      |
+| SPATIAL_INDEX.xxx.originLatitude  | NA                                       
 | Latitude of origin.                                                          
                                                                                
                                                                                
                                                                                
                                                     |
+| SPATIAL_INDEX.xxx.gridSize        | NA                                       
 | Grid size of raster data in metres. Currently, spatial index supports raster 
data.                                                                           
                                                                                
                                                                                
                                                     |
+| SPATIAL_INDEX.xxx.conversionRatio | NA                                       
 | Conversion factor. It allows user to translate longitude and latitude to 
long. For example, if the data to load is longitude = 13.123456, latitude = 
101.12356. User can configure conversion ratio sub-property value as 1000000, 
and change data to load as longitude = 13123456 and latitude = 10112356. 
Operations on long is much faster compared to floating-point numbers. |
+| SPATIAL_INDEX.xxx.class           | NA                                       
 | Optional user custom implementation class. Value is fully qualified class 
name.                                                                           
                                                                                
                                                                                
                                                        |
 
 Reference : [list of carbon properties](./configuration-parameters.md)
 
diff --git 
a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
 
b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
index bdb3054..5cbdb3b 100644
--- 
a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
+++ 
b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
@@ -928,7 +928,8 @@ object CommonLoadUtils {
               .map(columnName => columnName.toLowerCase())
             attributes.filterNot(a => 
staticPartCols.contains(a.name.toLowerCase))
           }
-          val spatialProperty = 
catalogTable.properties.get(CarbonCommonConstants.SPATIAL_INDEX)
+          val spatialProperty = catalogTable.storage
+            .properties.get(CarbonCommonConstants.SPATIAL_INDEX)
           // For spatial table, dataframe attributes will not contain geoId 
column.
           val isSpatialTable = spatialProperty.isDefined && 
spatialProperty.nonEmpty &&
                                    dfAttributes.length + 1 == 
expectedColumns.size

[carbondata] branch master updated: [CARBONDATA-4325] Update Data frame supported options in document and fix partition table creation with df spatial property

Reply via email to