[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

Anton Kedin (JIRA) Mon, 10 Jun 2019 15:30:31 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860390#comment-16860390
 ]


Anton Kedin commented on BEAM-7425:
-----------------------------------

It's not a bug or a feature, it's a user question about how to convert the 
output of `BigQueryIO.read()` to POJOs. I don't think there can be implemented 
any concrete solution for this for any specific release, so removing the fix 
version tag as it shows up during release process and is not actionable by the 
release owner. I suggest asking on 
[[email protected]|mailto:[email protected]] or 
[[email protected]|mailto:[email protected]] if the issue is not resolved.

Please take a look at examples of how to use the BigQueryIO.read(): 
[https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html]

You can use `readTableRows()` instead and get the parsed values out.

Take a look at snippets here: 
[https://github.com/apache/beam/blob/77cf84c634381495d45a112a9d147ad69394c0d4/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java#L168]

Or follow the TableRowParser implementation for an example of how such parser 
would work: 
[https://github.com/apache/beam/blob/79d478a83be221461add1501e218b9a4308f9ec8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L449]

 

> Reading BigQuery Table Data into Java Classes(Pojo) Directly
> ------------------------------------------------------------
>
>                 Key: BEAM-7425
>                 URL: https://issues.apache.org/jira/browse/BEAM-7425
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>    Affects Versions: 2.12.0
>         Environment: Dataflow
>            Reporter: Kishan Kumar
>            Priority: Major
>              Labels: newbie, patch
>             Fix For: 2.14.0
>
>
> While Developing my code I used the below code snippet to read the table data 
> from BigQuery.
>  
> {code:java}
> PCollection<ReasonCode> gpseEftReasonCodes = input
>       .apply("Reading xxyyzz", 
>           BigQueryIO
>                   .read(new ReadTable<ReasonCode>(ReasonCode.class))
>                   .withoutValidation()
>                   .withTemplateCompatibility()
>                   .fromQuery("Select * from dataset.xxyyzz")
>                   .usingStandardSql()
>                   .withCoder(SerializableCoder.of(xxyyzz.class))
> {code}
> Read Table Class:
> {code:java}
> @DefaultSchema(JavaBeanSchema.class)
> public class ReadTable<T> implements SerializableFunction<SchemaAndRecord, T> 
> {
>   private static final long serialVersionUID = 1L;
>   private static Gson gson = new Gson();
>   public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class); 
> private final Counter countingRecords = 
>   Metrics.counter(ReadTable.class, "Reading Records EFT Report");
>   private Class<T> class1;
>   
>   public ReadTable(Class<T> class1) { this.class1 = class1; }
>  
>   public T apply(SchemaAndRecord schemaAndRecord) {
>     Map<String, String> mapping = new HashMap<>();
>     int counter = 0;
>     try {
>       GenericRecord s = schemaAndRecord.getRecord();
>       org.apache.avro.Schema s1 = s.getSchema();
>       for (Field f : s1.getFields()) {
>         counter++;
>         mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>       }
>       countingRecords.inc();
>       JsonElement jsonElement = gson.toJsonTree(mapping);
>       return gson.fromJson(jsonElement, class1);
>     } catch (Exception mp) {
>       LOG.error("Found Wrong Mapping for the Record: "+mapping); 
> mp.printStackTrace(); return null; }
>     }
> }
> {code}
> So After Reading the data from Bigquery I was mapping data from 
> SchemaAndRecord to pojo I was getting value for columns whose Data type is 
> Numeric mention below.
> {code}
> last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
> {code}
> My Expectation was I will get exact value but getting the HyperByte Buffer 
> the version I am using is Apache beam 2.12.0. If any more information is 
> needed then please let me know.
> Way 2 Tried:
> {code:java}
> GenericRecord s = schemaAndRecord.getRecord();
> org.apache.avro.Schema s1 = s.getSchema();
> for (Field f : s1.getFields()) {
>   counter++;
>   mapping.put(f.name(), null==s.get(f.name()) ? null : 
> String.valueOf(s.get(counter)));
>   if(f.name().equalsIgnoreCase("reason_code_id")) {
>     BigDecimal numericValue = new Conversions.DecimalConversion()
>        .fromBytes((ByteBuffer) s.get(f.name()), Schema.create(s1.getType()), 
> s1.getLogicalType());
>        System.out.println("Numeric Con"+numericValue);
> } else {
>   System.out.println("Else Condition "+f.name());
> }
> {code}
> Facing Issue:
> {code}
> 2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: 
> RECORD
> {code}
>  
> It would be Great if we have a method which maps all the BigQuery Data with 
> Pojo Schema which Means if I have 10 Columns in BQ and in my Pojo I need only 
> 5 Column then, in that case, BigQueryIO should map only that 5 Data values 
> into Java Class and Rest will be Rejected As I am Doing After So much Effort. 
>  Numeric Data Type must be Deserialize by itself while fetching data like 
> TableRow.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-7425) Reading BigQuery Table Data into Java Classes(Pojo) Directly

Reply via email to