> On March 19, 2016, 3:52 p.m., Rajat Khandelwal wrote:
> > I'm still having trouble understanding the query `select customer_id,
> > customer_interests, unit_sales from sales where time_range_in(order_time,
> > '2015-04-11-00', '2015-04-13-00') and customer_interests not in ('Food')`.
> > The name `customer_interests` seems like a collection field. Which it is,
> > looking at the earlier query mentioned in testing done section. The filter
> > in the query is `customer_interests not in ('Food')` while the results make
> > it seem like the filter that's finally applied is `'Food' not in
> > customer_interests`.
> >
> > Secondly, to me, the query `select customer_id, customer_interests,
> > unit_sales from sales where time_range_in(order_time, '2015-04-11-00',
> > '2015-04-13-00') and 'Food' not in customer_interests` seems intuitive to
> > understand, since looking at field named `customer_interests`, there's an
> > automatic assumption involded that it's an array field. And if it's an
> > array field, `element in array` is the widely recognized syntax than `array
> > in element`.
> >
> > Would like to know other people's thoughts too.
> >
> > Will do the code review separately.
Rajat,
What you said looks fine, only if we are always doing flattening. Then field
itself can be defined as array in schema. But the field type is String and user
can ask for rows unflattened. and there are usecases where users are interested
in unflattened rows as well.
May be i can rename the field to customer_interest instead of
customer_interests, if that avoids the confusion.
- Amareshwari
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44386/#review124407
-----------------------------------------------------------
On March 17, 2016, 12:22 a.m., Amareshwari Sriramadasu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44386/
> -----------------------------------------------------------
>
> (Updated March 17, 2016, 12:22 a.m.)
>
>
> Review request for lens.
>
>
> Bugs: LENS-788
> https://issues.apache.org/jira/browse/LENS-788
>
>
> Repository: lens
>
>
> Description
> -------
>
> Changes include :
> - Adds option to flatten early, if required. Default value is false. So,
> aggregation are run over expressions if value is false.
> - IN, NOT IN, = and != filters are converted to array_contains filters. Added
> config to override for filter funciton to use.
> - Did some refactoring to update fromString in QueryAST
>
>
> Diffs
> -----
>
> lens-cube/src/main/java/org/apache/lens/cube/parse/CandidateFact.java
> 82ca4f4
> lens-cube/src/main/java/org/apache/lens/cube/parse/CubeQueryConfUtil.java
> d96b567
> lens-cube/src/main/java/org/apache/lens/cube/parse/CubeQueryContext.java
> b8b6db9
> lens-cube/src/main/java/org/apache/lens/cube/parse/DefaultAliasDecider.java
> dadbfa0
> lens-cube/src/main/java/org/apache/lens/cube/parse/DefaultQueryAST.java
> a403e36
> lens-cube/src/main/java/org/apache/lens/cube/parse/DimHQLContext.java
> 318c82a
> lens-cube/src/main/java/org/apache/lens/cube/parse/DimOnlyHQLContext.java
> d22287b
> lens-cube/src/main/java/org/apache/lens/cube/parse/HQLParser.java 8d6105f
> lens-cube/src/main/java/org/apache/lens/cube/parse/JoinResolver.java
> b861bb6
> lens-cube/src/main/java/org/apache/lens/cube/parse/MultiFactHQLContext.java
> 9c18b7e
> lens-cube/src/main/java/org/apache/lens/cube/parse/QueryAST.java 31680ca
>
> lens-cube/src/main/java/org/apache/lens/cube/parse/SingleFactMultiStorageHQLContext.java
> 63cb388
>
> lens-cube/src/main/java/org/apache/lens/cube/parse/SingleFactSingleStorageHQLContext.java
> b1a3b3f
>
> lens-cube/src/main/java/org/apache/lens/cube/parse/join/AutoJoinContext.java
> e14a898
>
> lens-cube/src/main/java/org/apache/lens/cube/parse/join/BridgeTableJoinContext.java
> PRE-CREATION
> lens-cube/src/main/resources/olap-query-conf.xml 1436cd1
> lens-cube/src/test/java/org/apache/lens/cube/parse/CubeTestSetup.java
> 42decc6
>
> lens-cube/src/test/java/org/apache/lens/cube/parse/TestBridgeTableQueries.java
> e8ad8ff
>
> lens-cube/src/test/java/org/apache/lens/cube/parse/TestDefaultAliasDecider.java
> PRE-CREATION
> lens-cube/src/test/java/org/apache/lens/cube/parse/TestHQLParser.java
> f9d7457
> lens-cube/src/test/java/org/apache/lens/cube/parse/TestQuery.java cd20fef
>
> lens-cube/src/test/java/org/apache/lens/cube/parse/join/TestBridgeTableJoinCtx.java
> PRE-CREATION
> lens-dist/src/main/assembly/bin-dist.xml dede085
>
> lens-examples/src/main/java/org/apache/lens/examples/PopulateSampleMetastore.java
> cfe3465
> lens-examples/src/main/java/org/apache/lens/examples/SampleMetastore.java
> 0c2301b
> lens-examples/src/main/resources/cube-queries.sql 9f4a353
> lens-examples/src/main/resources/customer-interests-local-part.xml
> PRE-CREATION
> lens-examples/src/main/resources/customer-interests-local.data PRE-CREATION
> lens-examples/src/main/resources/customer-interests.xml PRE-CREATION
> lens-examples/src/main/resources/customer_interests_table.xml PRE-CREATION
> lens-examples/src/main/resources/db-storage-schema.sql 0aceeab
> lens-examples/src/main/resources/interests-local-part.xml PRE-CREATION
> lens-examples/src/main/resources/interests-local.data PRE-CREATION
> lens-examples/src/main/resources/interests.xml PRE-CREATION
> lens-examples/src/main/resources/interests_table.xml PRE-CREATION
> lens-examples/src/main/resources/sales-cube.xml e944821
> src/site/apt/user/olap-cube.apt 4bed623
> src/site/apt/user/olap-query-conf.apt 6f84869
> tools/conf/client/lens-client-site.xml 706e356
>
> Diff: https://reviews.apache.org/r/44386/diff/
>
>
> Testing
> -------
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Lens Checkstyle Rules ............................. SUCCESS [1.906s]
> [INFO] Lens .............................................. SUCCESS [3.122s]
> [INFO] Lens API .......................................... SUCCESS [28.599s]
> [INFO] Lens API for server and extensions ................ SUCCESS [19.594s]
> [INFO] Lens Cube ......................................... SUCCESS
> [11:12.234s]
> [INFO] Lens DB storage ................................... SUCCESS [19.512s]
> [INFO] Lens Query Library ................................ SUCCESS [17.144s]
> [INFO] Lens Hive Driver .................................. SUCCESS [2:52.782s]
> [INFO] Lens Driver for JDBC .............................. SUCCESS [36.328s]
> [INFO] Lens Elastic Search Driver ........................ SUCCESS [17.957s]
> [INFO] Lens Server ....................................... SUCCESS
> [17:40.805s]
> [INFO] Lens client ....................................... SUCCESS [34.762s]
> [INFO] Lens CLI .......................................... SUCCESS [2:53.130s]
> [INFO] Lens Examples ..................................... SUCCESS [10.135s]
> [INFO] Lens Ship Jars to Distributed Cache ............... SUCCESS [0.737s]
> [INFO] Lens Distribution ................................. SUCCESS [9.731s]
> [INFO] Lens ML Lib ....................................... FAILURE
> [40:05.058s]
> [INFO] Lens ML Ext Distribution .......................... SKIPPED
> [INFO] Lens Regression ................................... SKIPPED
> [INFO] Lens UI ........................................... SKIPPED
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 1:18:04.413s
> [INFO] Finished at: Wed Mar 16 14:47:06 UTC 2016
> [INFO] Final Memory: 196M/1834M
> [INFO]
> ------------------------------------------------------------------------
>
> Updated examples with bridge tables and added queries.
>
> ----
> lens-shell>select customer_id, customer_interests, unit_sales from sales
> where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')
> 16 Mar 2016 13:24:43 [Spring Shell] INFO cliLogger - Driver query: 'SELECT
> ( sales . customer_id ), ( customer_interests_chain . balias0 ), sum(( sales
> . unit_sales )) FROM ex1.local_sales_aggr_fact1 sales left outer join
> ex1.local_customer_table customer on sales.customer_id = customer.id and
> (customer.dt = 'latest') left outer join (select
> customer_interests.customer_id as customer_id,collect_set((
> customer_interests_chain . name )) as balias0 from
> ex1.local_customer_interests_table customer_interests join
> ex1.local_interests_table customer_interests_chain on
> customer_interests.interest_id = customer_interests_chain.id and
> (customer_interests_chain.dt = 'latest') and (customer_interests.dt =
> 'latest') group by customer_interests.customer_id) customer_interests_chain
> on customer.id = customer_interests_chain.customer_id WHERE (((( sales . ot )
> = '2015-04-11' ) or (( sales . ot ) = '2015-04-12' ))) GROUP BY ( sales .
> customer_id ), ( customer_interests_chain . balias0 )' and Dri
ver handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=359ad260-7d03-4ea0-9c7e-254391dd52ed]
> 16 Mar 2016 13:24:53 [Spring Shell] INFO cliLogger - Query Status:Progress:
> 0.1818181872367859
> Status: RUNNING
> Status Message: Query is running in HiveServer!
> Is Result Set Available: false
> Progress Message:
> [{"taskId":"Stage-12","type":"MAPREDLOCAL","externalHandle":"Stage-12","taskState":"FINISHED_STATE"},{"taskId":"Stage-2","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-9","type":"CONDITIONAL","externalHandle":"Stage-9","taskState":"RUNNING_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"RUNNING_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"RUNNING_STATE"},{"taskId":"Stage-3","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"
taskState":"UNKNOWN_STATE"}]
>
>
> sales.customer_id customer_interests_chain.balias0 _c2
> Result available in memory, attaching here:
>
> 1 ["Fashion","Food"] 2
> 2 null 2
> 2 rows processed in (27) seconds.
>
> lens-shell>select customer_id, customer_interests, unit_sales from sales
> where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00') and
> customer_interests not in ('Food')
> 16 Mar 2016 13:25:17 [Spring Shell] INFO cliLogger - Driver query: 'SELECT
> ( sales . customer_id ), ( customer_interests_chain . balias0 ), sum(( sales
> . unit_sales )) FROM ex1.local_sales_aggr_fact1 sales left outer join
> ex1.local_customer_table customer on sales.customer_id = customer.id and
> (customer.dt = 'latest') left outer join (select
> customer_interests.customer_id as customer_id,collect_set((
> customer_interests_chain . name )) as balias0 from
> ex1.local_customer_interests_table customer_interests join
> ex1.local_interests_table customer_interests_chain on
> customer_interests.interest_id = customer_interests_chain.id and
> (customer_interests_chain.dt = 'latest') and (customer_interests.dt =
> 'latest') group by customer_interests.customer_id) customer_interests_chain
> on customer.id = customer_interests_chain.customer_id WHERE ((((( sales . ot
> ) = '2015-04-11' ) or (( sales . ot ) = '2015-04-12' )) and not
> array_contains(( customer_interests_chain . balias0 ), 'Food' ))) GROUP B
Y ( sales . customer_id ), ( customer_interests_chain . balias0 )' and Driver
handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=f945a7a7-7f2d-4cb2-b893-5886bd3753cd]
>
> 16 Mar 2016 13:25:37 [Spring Shell] INFO cliLogger - Query Status:Progress:
> 0.6363636255264282
> Status: RUNNING
> Status Message: Query is running in HiveServer!
> Is Result Set Available: false
> Progress Message:
> [{"taskId":"Stage-12","type":"MAPREDLOCAL","externalHandle":"Stage-12","taskState":"FINISHED_STATE"},{"taskId":"Stage-2","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-9","type":"CONDITIONAL","externalHandle":"Stage-9","taskState":"FINISHED_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"FINISHED_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"FINISHED_STATE"},{"taskId":"Stage-3","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"RUNNING_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"RUNNING_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":n
ull,"taskState":"RUNNING_STATE"}]
>
> sales.customer_id customer_interests_chain.balias0 _c2
> Result available in memory, attaching here:
>
> 2 null 2
> 1 rows processed in (27) seconds.
>
> lens-shell>select customer_id, customer_interests, unit_sales from sales
> where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00') and
> customer_interests in ('Food')
> 16 Mar 2016 13:26:31 [Spring Shell] INFO cliLogger - Driver query: 'SELECT
> ( sales . customer_id ), ( customer_interests_chain . balias0 ), sum(( sales
> . unit_sales )) FROM ex1.local_sales_aggr_fact1 sales left outer join
> ex1.local_customer_table customer on sales.customer_id = customer.id and
> (customer.dt = 'latest') left outer join (select
> customer_interests.customer_id as customer_id,collect_set((
> customer_interests_chain . name )) as balias0 from
> ex1.local_customer_interests_table customer_interests join
> ex1.local_interests_table customer_interests_chain on
> customer_interests.interest_id = customer_interests_chain.id and
> (customer_interests_chain.dt = 'latest') and (customer_interests.dt =
> 'latest') group by customer_interests.customer_id) customer_interests_chain
> on customer.id = customer_interests_chain.customer_id WHERE ((((( sales . ot
> ) = '2015-04-11' ) or (( sales . ot ) = '2015-04-12' )) and array_contains((
> customer_interests_chain . balias0 ), 'Food' ))) GROUP BY (
sales . customer_id ), ( customer_interests_chain . balias0 )' and Driver
handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=82df2138-8f33-4454-bd12-160cf38d7fba]
> 16 Mar 2016 13:26:51 [Spring Shell] INFO cliLogger - Query Status:Progress:
> 0.6363636255264282
> Status: RUNNING
> Status Message: Query is running in HiveServer!
> Is Result Set Available: false
> Progress Message:
> [{"taskId":"Stage-12","type":"MAPREDLOCAL","externalHandle":"Stage-12","taskState":"FINISHED_STATE"},{"taskId":"Stage-2","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-9","type":"CONDITIONAL","externalHandle":"Stage-9","taskState":"FINISHED_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"FINISHED_STATE"},{"taskId":"Stage-11","type":"MAPREDLOCAL","externalHandle":"Stage-11","taskState":"FINISHED_STATE"},{"taskId":"Stage-3","type":"MAPRED","externalHandle":null,"taskState":"UNKNOWN_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-8","type":"MAPRED","externalHandle":null,"taskState":"FINISHED_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"RUNNING_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":null,"taskState":"RUNNING_STATE"},{"taskId":"Stage-4","type":"MAPRED","externalHandle":n
ull,"taskState":"RUNNING_STATE"}]
>
> sales.customer_id customer_interests_chain.balias0 _c2
> Result available in memory, attaching here:
>
> 1 ["Fashion","Food"] 2
> 1 rows processed in (25) seconds.
> ----
>
>
> Thanks,
>
> Amareshwari Sriramadasu
>
>