nielm commented on a change in pull request #12611:
URL: https://github.com/apache/beam/pull/12611#discussion_r501066733
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
##########
@@ -678,6 +703,42 @@ public Read withPartitionOptions(PartitionOptions
partitionOptions) {
.withTransaction(getTransaction());
return input.apply(Create.of(getReadOperation())).apply("Execute query",
readAll);
}
+
+ SerializableFunction<Struct, Row> getFormatFn() {
+ return (SerializableFunction<Struct, Row>)
+ input ->
+ Row.withSchema(Schema.builder().addInt64Field("Key").build())
+ .withFieldValue("Key", 3L)
+ .build();
+ }
+ }
+
+ public static class ReadRows extends PTransform<PBegin, PCollection<Row>> {
+ Read read;
+ Schema schema;
+
+ public ReadRows(Read read, Schema schema) {
+ super("Read rows");
+ this.read = read;
+ this.schema = schema;
Review comment:
I don't see any good solution here...
When reading an entire table, it could be possible to read the table's
schema first, and determine what types the columns are, but this does not work
for a query as the query output columns may not correspond to table columns.
Adding `LIMIT 1` would only work for simple queries, anything with joins,
`GROUP BY`, `ORDER BY` will require the majority of the query to be executed
before a single row is returned.
So the only solution I can see is for the caller to specify the row Schema
as you do here..
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]