[jira] [Updated] (CARBONDATA-2536) MV Dataset - When user query has substring() of column under group by, which is same as the MV group by column, then the user query is not accessing the data from the MV datamap table.

Prasanna Ravichandran (JIRA) Fri, 25 May 2018 02:43:07 -0700

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prasanna Ravichandran updated CARBONDATA-2536:
----------------------------------------------
    Description: 
MV Dataset - When user query has substring() of column under group by, which is 
same as the  MV group by column, then the user query is not accessing the data 
from the MV datamap table. It is accessing the data from the main table only.

Test query:

carbon.sql("CREATE TABLE originTable (empno int, empname String, designation 
String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, 
deptno int, deptname String, projectcode int, projectjoindate Timestamp, 
projectenddate Timestamp,attendance int, utilization int,salary int) STORED BY 
'org.apache.carbondata.format'").show()

++
 
++
 ++

 

carbon.sql("LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' 
INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'\"','timestampformat'='dd-MM-yyyy')").show()

++
 
++
 ++

 

scala> carbon.sql("Create datamap m2 using 'mv' as select sum(salary) from 
originTable group by deptname").show(200,false)
 ++
 
++
 ++

scala> carbon.sql("rebuild datamap m2").show(200,false)
 ++
 
++
 ++

 

scala> carbon.sql("explain select sum(salary) from originTable group by 
substring(deptname,2,2)")
 res60: org.apache.spark.sql.DataFrame = [plan: string]

scala> carbon.sql("explain select sum(salary) from originTable group by 
substring(deptname,2,2)").show(200,false)
 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|plan|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|== CarbonData Profiler ==
 Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0|
|== Physical Plan ==
 *HashAggregate(keys=[substring(deptname#1138, 2, 2)#1255|#1138, 2, 2)#1255], 
functions=[sum(cast(salary#1144 as bigint))|#1144 as bigint))])
 +- Exchange hashpartitioning(substring(deptname#1138, 2, 2)#1255, 200)
 +- *HashAggregate(keys=[substring(deptname#1138, 2, 2) AS 
substring(deptname#1138, 2, 2)#1255|#1138, 2, 2) AS substring(deptname#1138, 2, 
2)#1255], functions=[partial_sum(cast(salary#1144 as bigint))|#1144 as 
bigint))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, Table 
name :origintable, Schema :Some(StructType(StructField(empno,IntegerType,true), 
StructField(empname,StringType,true), StructField(designation,StringType,true), 
StructField(doj,TimestampType,true), 
StructField(workgroupcategory,IntegerType,true), 
StructField(workgroupcategoryname,StringType,true), 
StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
StructField(projectcode,IntegerType,true), 
StructField(projectjoindate,TimestampType,true), 
StructField(projectenddate,TimestampType,true), 
StructField(attendance,IntegerType,true), 
StructField(utilization,IntegerType,true), 
StructField(salary,IntegerType,true))) ] 
default.origintable[deptname#1138,salary#1144|#1138,salary#1144]|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

 

Please check the attached document for reference.

  was:
MV Dataset - When user query has substring() of column under group, which is 
same as the  MV group by column, then the user query is not accessing the data 
from the MV datamap table. It is accessing the data from the main table only.

Test query:

carbon.sql("CREATE TABLE originTable (empno int, empname String, designation 
String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, 
deptno int, deptname String, projectcode int, projectjoindate Timestamp, 
projectenddate Timestamp,attendance int, utilization int,salary int) STORED BY 
'org.apache.carbondata.format'").show()

++
||
++
++

 

carbon.sql("LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' 
INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'\"','timestampformat'='dd-MM-yyyy')").show()

++
||
++
++

 

scala> carbon.sql("Create datamap m2 using 'mv' as select sum(salary) from 
originTable group by deptname").show(200,false)
++
||
++
++


scala> carbon.sql("rebuild datamap m2").show(200,false)
++
||
++
++

 

scala> carbon.sql("explain select sum(salary) from originTable group by 
substring(deptname,2,2)")
res60: org.apache.spark.sql.DataFrame = [plan: string]

scala> carbon.sql("explain select sum(salary) from originTable group by 
substring(deptname,2,2)").show(200,false)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|plan |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|== CarbonData Profiler ==
Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
|== Physical Plan ==
*HashAggregate(keys=[substring(deptname#1138, 2, 2)#1255], 
functions=[sum(cast(salary#1144 as bigint))])
+- Exchange hashpartitioning(substring(deptname#1138, 2, 2)#1255, 200)
 +- *HashAggregate(keys=[substring(deptname#1138, 2, 2) AS 
substring(deptname#1138, 2, 2)#1255], functions=[partial_sum(cast(salary#1144 
as bigint))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, Table 
name :origintable, Schema :Some(StructType(StructField(empno,IntegerType,true), 
StructField(empname,StringType,true), StructField(designation,StringType,true), 
StructField(doj,TimestampType,true), 
StructField(workgroupcategory,IntegerType,true), 
StructField(workgroupcategoryname,StringType,true), 
StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
StructField(projectcode,IntegerType,true), 
StructField(projectjoindate,TimestampType,true), 
StructField(projectenddate,TimestampType,true), 
StructField(attendance,IntegerType,true), 
StructField(utilization,IntegerType,true), 
StructField(salary,IntegerType,true))) ] 
default.origintable[deptname#1138,salary#1144]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

 

Please check the attached document for reference.


> MV Dataset - When user query has substring() of column under group by, which 
> is same as the MV group by column, then the user query is not accessing the 
> data from the MV datamap table.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2536
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2536
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>         Environment: 3 node opensource ANT Cluster
>            Reporter: Prasanna Ravichandran
>            Priority: Minor
>              Labels: Carbondata, MV, Materialistic_Views
>
> MV Dataset - When user query has substring() of column under group by, which 
> is same as the  MV group by column, then the user query is not accessing the 
> data from the MV datamap table. It is accessing the data from the main table 
> only.
> Test query:
> carbon.sql("CREATE TABLE originTable (empno int, empname String, designation 
> String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, 
> deptno int, deptname String, projectcode int, projectjoindate Timestamp, 
> projectenddate Timestamp,attendance int, utilization int,salary int) STORED 
> BY 'org.apache.carbondata.format'").show()
> ++
>  
> ++
>  ++
>  
> carbon.sql("LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' 
> INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-yyyy')").show()
> ++
>  
> ++
>  ++
>  
> scala> carbon.sql("Create datamap m2 using 'mv' as select sum(salary) from 
> originTable group by deptname").show(200,false)
>  ++
>  
> ++
>  ++
> scala> carbon.sql("rebuild datamap m2").show(200,false)
>  ++
>  
> ++
>  ++
>  
> scala> carbon.sql("explain select sum(salary) from originTable group by 
> substring(deptname,2,2)")
>  res60: org.apache.spark.sql.DataFrame = [plan: string]
> scala> carbon.sql("explain select sum(salary) from originTable group by 
> substring(deptname,2,2)").show(200,false)
>  
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |plan|
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |== CarbonData Profiler ==
>  Table Scan on origintable
>  - total blocklets: 1
>  - filter: none
>  - pruned by Main DataMap
>  - skipped blocklets: 0|
> |== Physical Plan ==
>  *HashAggregate(keys=[substring(deptname#1138, 2, 2)#1255|#1138, 2, 2)#1255], 
> functions=[sum(cast(salary#1144 as bigint))|#1144 as bigint))])
>  +- Exchange hashpartitioning(substring(deptname#1138, 2, 2)#1255, 200)
>  +- *HashAggregate(keys=[substring(deptname#1138, 2, 2) AS 
> substring(deptname#1138, 2, 2)#1255|#1138, 2, 2) AS substring(deptname#1138, 
> 2, 2)#1255], functions=[partial_sum(cast(salary#1144 as bigint))|#1144 as 
> bigint))])
>  +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> StructField(salary,IntegerType,true))) ] 
> default.origintable[deptname#1138,salary#1144|#1138,salary#1144]|
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>  
> Please check the attached document for reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2536) MV Dataset - When user query has substring() of column under group by, which is same as the MV group by column, then the user query is not accessing the data from the MV datamap table.

Reply via email to