[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

GitBox Wed, 08 Jan 2020 02:26:30 -0800

pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364160576


 ##########
 File path: docs/spatial-index-guide.md
 ##########
 @@ -0,0 +1,94 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
+
+The grid and point relationship is like that black point is the middle of a 
grid, the red dot is just inside the grid. The red point is inside the grid, it 
can be replaced by the center point of the grid, indicating that the point is 
within the grid. Therefore, the coordinates of points in a grid are replaced by 
black points in the middle. This is the characteristic of data load.  At the 
same time of data load, carbondata will generate hash ID according to the 
coordinates of rows and columns of the grid. These hash IDs are the same as Z 
order when querying. Detailed conversion algorithm can refer to the design 
documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree. The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area and 
group photo in map area. When the query polygon area is not disjon from the 
grid center point, the grid is considered selected.  In the following figure, 
user select a quadrilateral polygon,  The grid with the center point in the 
region will generate a quadtree. A list of line with continuous properties will 
be generated in the query process, like [97->97  99->99  102->102  104->111  
120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  
225->225  228->229], each part of the list represents a continuous grid area. 
Carbondata use that line list to prune and filtered. About the detail can be 
search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
    The nodes in the quad tree are composed of hash ids generated by the row 
and column information projected in the polygon area and **group photo** in map 
area.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] pawanmalwal commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

Reply via email to