guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154004113
@dossett Depends on your use case. If you are running a simple program that does data processing on a single host, then you're good. If you are using a big data processing tool (like me here, Flink) you can't pass around a DM instance from one task to the other, or at least, I did not find a way to make it work... For unrelated reasons, we are using the SelfDescribingMessage design pattern (https://developers.google.com/protocol-buffers/docs/techniques#self-description), which is a specific message, therefore serializable. From there we wrote a parquet writer which basically converts the SelfDescribingMessage to a DynamicMessage and then writes it using this upgraded ProtoWriteSupport. It's clearly convoluted unless you are already using a SelfDescribingMessage or equivalent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
