[
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330562
]
ASF GitHub Bot logged work on BEAM-7013:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m
Work Description: zfraa commented on pull request #9778: [BEAM-7013]
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336529565
##########
File path:
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
##########
@@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
}
/**
- * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll
sketch is computed by
- * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies
that we can run {@link
- * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam
to get the correct
- * estimated count.
+ * Test that non-empty HLL++ sketch computed in BigQuery can be processed by
Beam.
+ *
+ * <p>Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read
into Beam; the test
+ * verifies that we can run {@link HllCount.MergePartial} and {@link
HllCount.Extract} on the
+ * sketch in Beam to get the correct estimated count.
+ */
+ @Test
+ public void testReadNonEmptySketchFromBigQuery() {
+ readSketchFromBigQuery(DATA_TABLE_ID_NON_EMPTY, EXPECTED_COUNT_NON_EMPTY);
+ }
+
+ /**
+ * Test that empty HLL++ sketch computed in BigQuery can be processed by
Beam.
+ *
+ * <p>Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read
into Beam; the test
+ * verifies that we can run {@link HllCount.MergePartial} and {@link
HllCount.Extract} on the
+ * sketch in Beam to get the correct estimated count.
*/
@Test
- public void testReadSketchFromBigQuery() {
- String tableSpec = String.format("%s.%s", DATASET_ID, DATA_TABLE_ID);
+ public void testReadEmptySketchFromBigQuery() {
+ readSketchFromBigQuery(DATA_TABLE_ID_EMPTY, EXPECTED_COUNT_EMPTY);
+ }
+
+ private void readSketchFromBigQuery(String tableId, Long expectedCount) {
+ String tableSpec = String.format("%s.%s", DATASET_ID, tableId);
String query =
String.format(
"SELECT HLL_COUNT.INIT(%s) AS %s FROM %s",
DATA_FIELD_NAME, QUERY_RESULT_FIELD_NAME, tableSpec);
+
SerializableFunction<SchemaAndRecord, byte[]> parseQueryResultToByteArray =
- (SchemaAndRecord schemaAndRecord) ->
- // BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
- ((ByteBuffer)
schemaAndRecord.getRecord().get(QUERY_RESULT_FIELD_NAME)).array();
+ input -> {
+ // BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
+ ByteBuffer sketch = (ByteBuffer)
input.getRecord().get(QUERY_RESULT_FIELD_NAME);
+ if (sketch == null) {
+ // Empty sketch is represented by null in BigQuery and by empty
byte array in Beam
+ return new byte[0];
+ } else {
+ byte[] result = new byte[sketch.remaining()];
Review comment:
why not
`return sketch.array()`
as previously? Since we can't be 100% sure that the ByteBuffer is backed by
an accessible array?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 330562)
Time Spent: 35h (was: 34h 50m)
> A new count distinct transform based on BigQuery compatible HyperLogLog++
> implementation
> ----------------------------------------------------------------------------------------
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
> Issue Type: New Feature
> Components: extensions-java-sketching, sdk-java-core
> Reporter: Yueyang Qiu
> Assignee: Yueyang Qiu
> Priority: Major
> Fix For: 2.16.0
>
> Time Spent: 35h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)