Hi Everyone,

Unfortunately, I was not able to participate on the last Parquet sync where you 
discussed how to separate the java public code parts that are public for 
internal use only or public to be exposed to the user. I am currently adding 
some new code related to the column indexes and would like to ask about your 
opinions.
Use the Yetus InterfaceAudience annotations to mark the public/private classes
pros:
backward compatible (if we want to categorise the already existing API, it can 
be done by using this)
already used by some of the Hadoop components
cons:
not sure if it is much better than a simple comment; don’t know any tooling 
that checks this
even if we mark some existing API private we cannot remove/modify them 
otherwise we break the compatibility
Use internal package naming convention for the new classes
pros:
obvious to the user
future proof (if we would like to use the java9 module feature in parquet-mr 
2.x)
cons:
backward incompatible (we are not able to categorise the already existing API)
can be ignored; I don’t know any way for java8 to enforce it

If we would like to mark the internal API for the existing code I would vote on 
using the Yetus annotations while for the new code I think the internal 
packages is a better choice. What do you think?

For the internal packages it is another question how exactly we would like to 
implement it. For example, I am adding some new classes related to column 
indexes into the package org.apache.parquet.column.columnindex. All of these 
classes are for internal use only. Where should I put “internal”? I would vote 
on using org.apache.parquet.internal.column.columnindex so later all modules 
would have the org.apache.parquet.internal structure for internal use and all 
the others are public. What do you think?

Thanks a lot,
Gabor

Reply via email to