This is an automated email from the ASF dual-hosted git repository.

ajantha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new dc77027  [CARBONDATA-3548] Add spatial-index user guid to doc
dc77027 is described below

commit dc77027cba5643b9c228668acd306fee3a0a7517
Author: litao <litao_xid...@126.com>
AuthorDate: Thu Dec 19 21:12:21 2019 +0800

    [CARBONDATA-3548] Add spatial-index user guid to doc
    
    Why is this PR needed?
    Spatial index feature document is not updated yet.
    
    What changes were proposed in this PR?
    updated the document for spatial index feature
    
    Does this PR introduce any user interface change?
    updated the document for psatial index feature.
    
    Is any new testcase added?
    No
    
    This closes #3520
    
    Co-authored-by: VenuReddy2103 <venugopalred...@huawei.com>
---
 docs/images/spatial-index-1.png | Bin 0 -> 1835 bytes
 docs/images/spatial-index-2.png | Bin 0 -> 16910 bytes
 docs/spatial-index-guide.md     |  97 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/docs/images/spatial-index-1.png b/docs/images/spatial-index-1.png
new file mode 100644
index 0000000..7890fc4
Binary files /dev/null and b/docs/images/spatial-index-1.png differ
diff --git a/docs/images/spatial-index-2.png b/docs/images/spatial-index-2.png
new file mode 100644
index 0000000..d55f01f
Binary files /dev/null and b/docs/images/spatial-index-2.png differ
diff --git a/docs/spatial-index-guide.md b/docs/spatial-index-guide.md
new file mode 100644
index 0000000..a9010a0
--- /dev/null
+++ b/docs/spatial-index-guide.md
@@ -0,0 +1,97 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# What is spatial index
+
+[A spatial index](https://gistbok.ucgis.org/topic-keywords/indexing) is a data 
structure that allows for accessing a spatial object efficiently. It is a 
common technique used by spatial databases.  Without indexing, any search for a 
feature would require a "sequential scan" of every record in the database, 
resulting in much longer processing time. In a spatial index construction 
process, the minimum bounding rectangle serves as an object approximation. 
Various types of spatial indices ac [...]
+
+
+
+# How does CarbonData implement spatial index
+
+There are many open source implementations for spatial indexing and to process 
spatial queries. CarbonData implements a different way of spatial index. Its 
core idea is to use the raster data. Raster is made up of matrix of cells 
organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the [Z order curve](https://en.wikipedia.org/wiki/Z-order_curve).
+
+CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
+
+Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These indices [...]
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+Carbon supports Polygon User Defined Function(UDF) as filter condition in the 
query to return all the data points lying within it. Polygon UDF takes multiple 
points(i.e., pair of longitude and latitude) separated by a comma. Longitude 
and latitude in the pair are separated by a space. And the first and last 
points in the polygon must be form a closed loop. CarbonData builds a quad tree 
using this polygon and spatial region information passed while creating a 
table. The nodes in the quad  [...]
+The main reasons for faster query response are as follows :
+* Data is sorted based on the index values.
+* Polygon UDF filter is pushed down from engine to the carbon layer such that 
CarbonData scans only matched blocklets avoiding full scan.
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+# Installation and Deployment
+
+Geo is a separate module in the Project. It can be included or excluded from 
the project build based on the requirement.
+
+## Basic Command
+
+### Create Table
+
+Create table with spatial index table properties
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash',  
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='1000000');
+```
+Note: `mygeohash` in the above example represent the index name.
+
+#### List of spatial index table properties
+
+|Name|Description|
+|-----------------------------------|-----------------------------------------------------------------------------------------|
+| INDEX_HANDLER | Used to configure Index name. This name is appended to 
`INDEX_HANDLER` in the subsequent sub-property configurations. `xxx` in the 
below sub-properties refer to index name.|
+| INDEX_HANDLER.xxx.type | Type of algorithm for processing spatial data. 
Currently, supports only 'geohash'.|
+| INDEX_HANDLER.xxx.sourcecolumns | longitude and latitude column names as in 
the table. These columns are used to generate index value for each row.|
+| INDEX_HANDLER.xxx.gridSize | Grid size of raster data in metres. Currently, 
spatial index supports raster data.|
+| INDEX_HANDLER.xxx.minLongitude | Minimum longitude of the gridded 
rectangular area.|
+| INDEX_HANDLER.xxx.maxLongitude | Maximum longitude of the gridded 
rectangular area.|
+| INDEX_HANDLER.xxx.minLatitude | Minimum latitude of the gridded rectangular 
area.|
+| INDEX_HANDLER.xxx.maxLatitude | Maximum latitude of the gridded rectangular 
area.|
+| INDEX_HANDLER.xxx.conversionRatio | Conversion factor. It allows user to 
translate longitude and latitude to long. For example, if the data to load is 
longitude = 13.123456, latitude = 101.12356. User can configure conversion 
ratio sub-property value as 1000000, and change data to load as longitude = 
13123456 and latitude = 10112356. Operations on long is much faster compared to 
floating-point numbers.|
+| INDEX_HANDLER.xxx.class | Optional user custom implementation class. Value 
is fully qualified class name.|
+
+
+### Select Query
+
+Query with Polygon UDF predicate
+
+```
+select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 
5.947911,16.560993 5.935276,16.321011 4.123503')
+```
+
+## Reference
+
+```
+[1] https://issues.apache.org/jira/browse/CARBONDATA-3548
+[2] https://gistbok.ucgis.org/topic-keywords/indexing
+[3] https://en.wikipedia.org/wiki/Z-order_curve
+```
\ No newline at end of file

Reply via email to