amalakar opened a new pull request #7865: [FLINK-9650] [formats] add support 
for protobuf objects
URL: https://github.com/apache/flink/pull/7865
 
 
   flink-protobuf
   ==========
   
   This library adds support to flink for running sql against protobuf objects. 
Flink as of now
   supports avro and json files backed by JsonSchema only. To add support for 
sql, flink needs to know 
   the TypeInformation, this library provides TypeInformation for protobuf 
object.
   
   It uses protobuf apis to retrieve fields and types of a prorobuf object and 
than provides the
   field name, and type as a 
[PojoField](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/PojoField.java)
 to flink.
   
   Current limitations:
   
   - In protobuf object field names have underscore at the end like 
`loggedAt_`, so in the sql it needs
   to be referred as `loggedAt_` instead of `logged_at`. This should be fixable 
in flink apis, but 
   would need some digging around in the code. If we whitelist `Message` 
classes in `PojoField` that should help.
   
   - Some fields are not supported yet like `Enum` etc, but should be trivial 
to add support.
   
   With this it is posisble to run a query like the following in the stream of 
say `ride_requested`
   
   ```sql
   SELECT region_,
          count(*)
   FROM people
   WHERE currentAge_ > 40
     AND region_ IN ('SFO',
                    'BKN')
   GROUP BY region_
   ```
   
   Note: I have been a bit hasty to get this out, as this was sitting in our 
internal repo for a while and I haven't had the time to clean it up to make it 
flink ready. But also wanted to get the code out if someone wants to work on it 
they can work off this code rather than working on it from scratch. We have 
been using this for close to an year in production. Due to other commitments I 
may not get a chance to work on coding style/review comments immediately, so 
wouldn't mind if someone wants to improve this before merge. For example some 
there are pending TODO items like enum support/change in `PojoField` to make 
the sql nicer (no underscore) etc. 
   
   (Apologize for not conforming to the coding style and the rest of the 
guidelines yet, hoping it is still useful as a beta version patch and someone 
may find this useful).
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to