[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

ASF GitHub Bot (Jira) Sat, 16 Sep 2023 02:58:07 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765935#comment-17765935
 ]


ASF GitHub Bot commented on PARQUET-2347:
-----------------------------------------

amousavigourabi opened a new pull request, #1141:
URL: https://github.com/apache/parquet-mr/pull/1141

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
     - https://issues.apache.org/jira/browse/PARQUET-XXX
     - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
          Additional parameters to run the read/write tests using the new 
interface-using methods.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   
   ---
   
   jacicmp exclusions have been added for the following classes: 
`org.apache.parquet.hadoop.CodecFactory`, 
`org.apache.parquet.hadoop.ParquetReader`. When these exclusions are removed, 
the following incompatibilities are detected:
   ```
   There is at least one incompatibility: 
org.apache.parquet.hadoop.CodecFactory.configuration:FIELD_TYPE_CHANGED,org.apache.parquet.hadoop.ParquetReader$Builder.conf:FIELD_TYPE_CHANGED
   ```
   
   This PR is part of an effort that has been [discussed on the dev mailing 
list](https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8).
   




> Add interface layer between Parquet and Hadoop Configuration
> ------------------------------------------------------------
>
>                 Key: PARQUET-2347
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2347
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Atour Mousavi Gourabi
>            Priority: Minor
>
> Parquet relies heavily on a few Hadoop classes, such as its Configuration 
> class, which is used throughout Parquet's reading and writing logic. If we 
> include our own interface for this, this could potentially allow users to use 
> Parquet's readers and writers without the Hadoop dependency later on.
> In order to preserve backward compatibility and avoid breaking downstream 
> projects, the constructors and methods using Hadoop's constructor should be 
> preserved for the time being, though I would favour deprecation in the near 
> future.
> This is part of an effort that has been [discussed on the dev mailing 
> list|https://lists.apache.org/thread/4wl0l3d9dkpx4w69jx3rwnjk034dtqr8].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

Reply via email to