[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3523: [doc_chinese_doc ]add Create table scene by day sortColumn effect analysis chinese doc …

GitBox Tue, 07 Jan 2020 03:19:47 -0800

xuchuanyin commented on a change in pull request #3523: [doc_chinese_doc ]add 
Create table scene by day sortColumn effect analysis chinese doc …
URL: https://github.com/apache/carbondata/pull/3523#discussion_r363698791


 ##########
 File path: docs/zh_cn/明细数据查询的典型Carbon应用-点查 过滤条件.md
 ##########
 @@ -0,0 +1,525 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# 明细数据查询的典型CarbonData应用-点查+过滤条件
+
+## 背景  
+
+        
本文主要针对使用CarbonData在明细数据查询场景下如何配置建表、加载、查询时参数为主要阐述对象，指导用户在建表时选择合适的字典配置及SORT_COLUMNS、SORT_SCOPE配置。并且给出了一组不同配置是进行加载，查询的耗时情况，用户可以根据自己的业务特点和场景选择合适的参数。
+
+        
本文中数据表及查询的主要特点是：表记录数都比较大，列数比较多，大约在100-600行之间，表的大小从数千万到数百亿之间。在查询的时候主要是进行点查和过滤，没有汇聚计算，偶尔有关联维表的场景。数据入库采取分批入库的方式，周期约为5分钟，按天建表。查询时可能有不少于20的并发查询。
+
+典型的查询的使用框架，其中第五个求sum仅为作性能对比。
+
+1.点查: select * from table where id_a=‘ ’ limit 1000;
+
+2.模糊查询: select * from table where id_a like '1234%' limit 1000;
+
+3.求记录总数: select count(1) from table;
+
+4.求最大/最小值: select max(id_a), min(id_a) from table;
+
+5.求sum(仅为了做性能对比): select sum(id_a) from table;
+
+数据的特点，列主要是以int, bigint, string列构成，描述一些号码列，时间列，ID列等，无复杂数据类型。
+
+
+
+## 测试环境
+
+| 集群       | CPU                  | vCore | Memory | 硬盘                  | 描述  
                                                       |
+| ---------- | -------------------- | ----- | ------ | --------------------- | 
------------------------------------------------------------ |
+| Hadoop集群 | Gold 6132 [email protected] | 56    | 256GB  | SATA磁盘，12块做RAID0 | 
2个namenode，6个datanode， 查询队列分配1/6的资源，等同于一个节点 |
 
 Review comment:
   “ SATA磁盘，12块做RAID0 ”
   ---
   Hdfs集群做了raid？请确认

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] xuchuanyin commented on a change in pull request #3523: [doc_chinese_doc ]add Create table scene by day sortColumn effect analysis chinese doc …

Reply via email to