azagrebin opened a new pull request #9617: [FLINK-13963] Consolidate Hadoop 
file systems usage and Hadoop integration docs
URL: https://github.com/apache/flink/pull/9617
 
 
   ## What is the purpose of the change
   
   We have hadoop related docs in several places at the moment:
   
   - **dev/batch/connectors.md** (Hadoop FS implementations and setup)
   - **dev/batch/hadoop_compatibility.md** (not valid any more that Flink 
always has Hadoop types out of the box as we do not build and provide Flink 
with Hadoop by default)
   - **ops/filesystems/index.md** (plugins, Hadoop FS implementations and setup 
revisited)
   - **ops/deployment/hadoop.md** (Hadoop classpath)
   - **ops/config.md** (deprecated way to provide Hadoop configuration in Flink 
conf)
   
   We could consolidate all these pieces of docs into a consistent structure to 
help users to navigate through the docs to well-defined spots depending on 
which feature they are trying to use.
   
   The places in docs which should contain the information about Hadoop:
   
   - **dev/batch/hadoop_compatibility.md** (only Dataset API specific stuff 
about integration with Hadoop)
   - **ops/filesystems/index.md** (Flink FS plugins and Hadoop FS 
implementations)
   - **ops/deployment/hadoop.md** (Hadoop configuration and classpath)
   
   How to setup Hadoop itself should be only in ops/deployment/hadoop.md. All 
other places dealing with Hadoop/HDFS should contain only their related things 
and just reference it 'how to configure Hadoop'. Like all chapters about 
writing to file systems (batch connectors and streaming file sinks) should just 
reference file systems.
   
   ## Brief change log
   
   See previous section.
   
   ## Verifying this change
   
   Run ./docs/build_docs.sh -i and open http://localhost:4000 in browser to 
check the doc changes.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
     - The S3 file system connector: (yes)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (docs)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to