This is an automated email from the ASF dual-hosted git repository.
indhumuthumurugesh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new c840b5f [CARBONDATA-4325] Update Data frame supported options in
document and fix partition table creation with df spatial property
c840b5f is described below
commit c840b5f30b15df54778b2a83608c727d25553d7c
Author: ShreelekhyaG <[email protected]>
AuthorDate: Mon Feb 28 14:57:34 2022 +0530
[CARBONDATA-4325] Update Data frame supported options in document and fix
partition table creation with df spatial property
Why is this PR needed?
1. Only specific properties are supported using dataframe options. Need to
update the documentation.
2. Create partition table fails with Spatial index property for carbon
table created with dataframe in spark-shell.
What changes were proposed in this PR?
1. Added data frame supported properties in the documentation.
2. Using spark-shell, the table gets created with carbon session and
catalogTable.properties
is empty here. Getting the properties from catalogTable.storage.properties
to access the properties set.
Does this PR introduce any user interface change?
No
Is any new testcase added?
No, tested in cluster.
This closes #4250
---
docs/carbon-as-spark-datasource-guide.md | 18 ++++++++++++++++++
.../execution/command/management/CommonLoadUtils.scala | 3 ++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/docs/carbon-as-spark-datasource-guide.md
b/docs/carbon-as-spark-datasource-guide.md
index 275d5b1..e578ed0 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -96,6 +96,24 @@ df.write.format("carbon").save("/user/person_table")
val dfread = spark.read.format("carbon").load("/user/person_table")
dfread.show()
```
+## Supported OPTIONS using dataframe
+
+In addition to the above [Supported Options](#supported-options), following
properties are supported using dataframe.
+
+| Property | Default Value
| Description
|
+|-----------------------------------|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| bucket_number | NA
| Number of buckets to be created. For more details, see
[Bucketing](./ddl-of-carbondata.md#bucketing).
|
+| bucket_columns | NA
| Columns which are to be placed in buckets. For more details, see
[Bucketing](./ddl-of-carbondata.md#bucketing).
|
+| streaming | false
| Whether the table is a streaming table. For more details, see
[Streaming](./ddl-of-carbondata.md#streaming).
|
+| timestampformat | yyyy-MM-dd HH:mm:ss
| For specifying the format of TIMESTAMP data type column. For more details,
see [TimestampFormat](./ddl-of-carbondata.md#dateformattimestampformat).
|
+| dateformat | yyyy-MM-dd
| For specifying the format of DATE data type column. For more details, see
[DateFormat](./ddl-of-carbondata.md#dateformattimestampformat).
|
+| SPATIAL_INDEX | NA
| Used to configure Spatial Index name. This name is appended to
`SPATIAL_INDEX` in the subsequent sub-property configurations. `xxx` in the
below sub-properties refer to index name. Generated spatial index column is not
allowed in any properties except in `SORT_COLUMNS` table property.For more
details, see [Spatial Index](./spatial-index-guide). |
+| SPATIAL_INDEX.xxx.type | NA
| Type of algorithm for processing spatial data. Currently, supports 'geohash'
and 'geosot'.
|
+| SPATIAL_INDEX.xxx.sourcecolumns | NA
| longitude and latitude column names as in the table. These columns are used
to generate index value for each row.
|
+| SPATIAL_INDEX.xxx.originLatitude | NA
| Latitude of origin.
|
+| SPATIAL_INDEX.xxx.gridSize | NA
| Grid size of raster data in metres. Currently, spatial index supports raster
data.
|
+| SPATIAL_INDEX.xxx.conversionRatio | NA
| Conversion factor. It allows user to translate longitude and latitude to
long. For example, if the data to load is longitude = 13.123456, latitude =
101.12356. User can configure conversion ratio sub-property value as 1000000,
and change data to load as longitude = 13123456 and latitude = 10112356.
Operations on long is much faster compared to floating-point numbers. |
+| SPATIAL_INDEX.xxx.class | NA
| Optional user custom implementation class. Value is fully qualified class
name.
|
Reference : [list of carbon properties](./configuration-parameters.md)
diff --git
a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
index bdb3054..5cbdb3b 100644
---
a/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
+++
b/integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CommonLoadUtils.scala
@@ -928,7 +928,8 @@ object CommonLoadUtils {
.map(columnName => columnName.toLowerCase())
attributes.filterNot(a =>
staticPartCols.contains(a.name.toLowerCase))
}
- val spatialProperty =
catalogTable.properties.get(CarbonCommonConstants.SPATIAL_INDEX)
+ val spatialProperty = catalogTable.storage
+ .properties.get(CarbonCommonConstants.SPATIAL_INDEX)
// For spatial table, dataframe attributes will not contain geoId
column.
val isSpatialTable = spatialProperty.isDefined &&
spatialProperty.nonEmpty &&
dfAttributes.length + 1 ==
expectedColumns.size