shangxinli commented on pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-671465375
> But you have other options in the same WriteSupport init function. One of
them is to use the Hadoop configuration. It is not global anymore, but rather a
local copy, an object created for the given worker. It still carries the global
properties, and you can add your crypto properties here; they will be carried
to the crypto factory.
@ggershinsky, thanks for understanding! As discussed in the 3rd paragraph in
my last reply, we do have options to use 'Configuration' but we prefer to let
the column level settings within the schema itself. The reasons are explained
in the 1st paragraph. Although it is a local copy, the problems I mentioned are
still there. I tried that solution earlier before coming to this current
solution. Using 'extraMetadata' field is slightly better but still using
schema is better.
Since we already defined the interface EncryptionPropertiesFactory, I think
we can let users decide how do they construct it and which channel they get the
needed information because in reality there could be many other use cases that
we don't know at this point. They should choose the one that best fits their
use cases. I don't think we should/can limit to use just one. As you mentioned,
they can use 'Configuration' and 'extraMetadata' today that are already two.
People can also use RPC calls to get it sometime etc.
Having a metadata field at column level in the schema, in my opinion, should
be supported in Parquet as I mentioned multiple times earlier that many other
system's schemas like Avro, Spark already have. If we have some column-wise
configurations, that would be a good place to set it. I think Gabor already has
a PR for column-wise configuration.
I think we are clear on the use cases, needs, change themselves. Can you
share your concern about adding the metadata field? I feel it should be a safe
change(the earlier comments with Gabor discussed that). Please let me know what
do you think.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]