Andrew Pilloud created BEAM-6224:
------------------------------------
Summary: Beam SQL Aggregation broken
Key: BEAM-6224
URL: https://issues.apache.org/jira/browse/BEAM-6224
Project: Beam
Issue Type: Bug
Components: dsl-sql
Affects Versions: 2.9.0, 2.10.0
Reporter: Andrew Pilloud
Assignee: Andrew Pilloud
Ran [the
demo|https://docs.google.com/document/d/1JngKurr0wBZMsDokx3QrcLbyOhjNogE3FM7POVAowgc/edit]
for SQL, one query fails with the following, another just hangs.
{code:java}
the keyCoder of a GroupByKey must be deterministic{code}
{code:java}
CREATE EXTERNAL TABLE taxi_rides (
event_timestamp TIMESTAMP,
attributes MAP<VARCHAR, VARCHAR>,
payload ROW<
ride_id VARCHAR,
point_idx INT,
latitude DOUBLE,
longitude DOUBLE,
meter_reading DOUBLE,
meter_increment DOUBLE,
ride_status VARCHAR,
passenger_count TINYINT>)
TYPE pubsub
LOCATION 'projects/pubsub-public-data/topics/taxirides-realtime'
TBLPROPERTIES '{"timestampAttributeKey": "ts"}';
WITH geo_cells AS (
SELECT FLOOR(taxi_rides.payload.latitude / 0.05) * 0.05 AS reduced_lat,
FLOOR(taxi_rides.payload.longitude / 0.05) * 0.05 AS reduced_lon,
taxi_rides.event_timestamp
FROM taxi_rides)
SELECT COUNT(*) as num_events,
geo_cells.reduced_lat,
geo_cells.reduced_lon,
TUMBLE_START(geo_cells.event_timestamp, INTERVAL '1' SECOND)
FROM geo_cells
GROUP BY geo_cells.reduced_lat,
geo_cells.reduced_lon,
TUMBLE(geo_cells.event_timestamp, INTERVAL '1' SECOND)
LIMIT 10;{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)