[
https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553618#comment-17553618
]
ASF GitHub Bot commented on PARQUET-1020:
-----------------------------------------
guillaume-fetter commented on PR #963:
URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154004113
@dossett Depends on your use case. If you are running a simple program that
does data processing on a single host, then you're good. If you are using a big
data processing tool (like me here, Flink) you can't pass around a DM instance
from one task to the other, or at least, I did not find a way to make it work...
For unrelated reasons, we are using the SelfDescribingMessage design pattern
(https://developers.google.com/protocol-buffers/docs/techniques#self-description),
which is a specific message, therefore serializable. From there we wrote a
parquet writer which basically converts the SelfDescribingMessage to a
DynamicMessage and then writes it using this upgraded ProtoWriteSupport.
It's clearly convoluted unless you are already using a SelfDescribingMessage
or equivalent.
> Add support for Dynamic Messages in parquet-protobuf
> ----------------------------------------------------
>
> Key: PARQUET-1020
> URL: https://issues.apache.org/jira/browse/PARQUET-1020
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-protobuf
> Reporter: Alex Buck
> Assignee: Alex Buck
> Priority: Major
>
> Hello. We would like to pass in a DynamicMessage rather than using the
> generated protobuf classes to allow us to make our job very generic.
> I think this could be achieved by setting the descriptor upfront, similarly
> to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet
> schema created by ProtoSchemaConverter using the passed in descriptor, rather
> than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach
> above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414
--
This message was sent by Atlassian Jira
(v8.20.7#820007)