[ 
https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553618#comment-17553618
 ] 

ASF GitHub Bot commented on PARQUET-1020:
-----------------------------------------

guillaume-fetter commented on PR #963:
URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154004113

   @dossett Depends on your use case. If you are running a simple program that 
does data processing on a single host, then you're good. If you are using a big 
data processing tool (like me here, Flink) you can't pass around a DM instance 
from one task to the other, or at least, I did not find a way to make it work...
   For unrelated reasons, we are using the SelfDescribingMessage design pattern 
(https://developers.google.com/protocol-buffers/docs/techniques#self-description),
 which is a specific message, therefore serializable. From there we wrote a 
parquet writer which basically converts the SelfDescribingMessage to a 
DynamicMessage and then writes it using this upgraded ProtoWriteSupport.
   
   It's clearly convoluted unless you are already using a SelfDescribingMessage 
or equivalent.




> Add support for Dynamic Messages in parquet-protobuf
> ----------------------------------------------------
>
>                 Key: PARQUET-1020
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1020
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-protobuf
>            Reporter: Alex Buck
>            Assignee: Alex Buck
>            Priority: Major
>
> Hello. We would like to pass in a DynamicMessage rather than using the 
> generated protobuf classes to allow us to make our job very generic. 
> I think this could be achieved by setting the descriptor upfront, similarly 
> to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet 
> schema created by ProtoSchemaConverter using the passed in descriptor, rather 
> than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach 
> above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to