Hi Everyone, Unfortunately, I was not able to participate on the last Parquet sync where you discussed how to separate the java public code parts that are public for internal use only or public to be exposed to the user. I am currently adding some new code related to the column indexes and would like to ask about your opinions. Use the Yetus InterfaceAudience annotations to mark the public/private classes pros: backward compatible (if we want to categorise the already existing API, it can be done by using this) already used by some of the Hadoop components cons: not sure if it is much better than a simple comment; don’t know any tooling that checks this even if we mark some existing API private we cannot remove/modify them otherwise we break the compatibility Use internal package naming convention for the new classes pros: obvious to the user future proof (if we would like to use the java9 module feature in parquet-mr 2.x) cons: backward incompatible (we are not able to categorise the already existing API) can be ignored; I don’t know any way for java8 to enforce it
If we would like to mark the internal API for the existing code I would vote on using the Yetus annotations while for the new code I think the internal packages is a better choice. What do you think? For the internal packages it is another question how exactly we would like to implement it. For example, I am adding some new classes related to column indexes into the package org.apache.parquet.column.columnindex. All of these classes are for internal use only. Where should I put “internal”? I would vote on using org.apache.parquet.internal.column.columnindex so later all modules would have the org.apache.parquet.internal structure for internal use and all the others are public. What do you think? Thanks a lot, Gabor
