alamb commented on code in PR #125:
URL: https://github.com/apache/parquet-site/pull/125#discussion_r2452123066
##########
content/en/_index.md:
##########
@@ -18,8 +18,8 @@ It provides high performance compression and encoding schemes
to handle complex
{{< blocks/section color="white" type="row">}}
-{{% blocks/feature icon="fab fa-jira" title="Parquet Specification"
url="https://github.com/apache/parquet-format" %}}
-Browse the Parquet Specification
+{{% blocks/feature icon="fab fa-jira" title="Documentation" url="docs" %}}
+Browse project documentation including the format specification.
Review Comment:
sending people immediately to the spec from the homepage was a bit jarring
-- I suggest linking them to the intro page on the docs (which now also has a
link to the actual spec, along with a clearer description)
##########
content/en/docs/Overview/_index.md:
##########
@@ -9,37 +9,40 @@ description: >
Apache Parquet is an open source, column-oriented data file format designed
for efficient data storage and retrieval.
It provides high performance compression and encoding schemes to handle
complex data in bulk and is supported in many programming language and
analytics tools.
-This documentation contains information about both the
[parquet-java](https://github.com/apache/parquet-java) and
[parquet-format](https://github.com/apache/parquet-format) repositories.
-### parquet-format
+### parquet-format (Specification)
-The parquet-format repository hosts the official specification of the Apache
Parquet file format, defining how data is structured and stored. This
specification, along with Thrift metadata definitions and other crucial
components, is essential for developers to effectively read and write Parquet
files. The parquet-format project specifically contains the format
specifications needed to understand and properly utilize Parquet files.
+The [parquet-format] repository hosts the official specification of the
Parquet file format, defining how data is structured and stored. This
specification, along with the [parquet.thrift] Thrift metadata definitions, is
necessary for developing software effectively read and write Parquet files.
Review Comment:
the original wording was fairly redundant -- I tried to make it more concise
and and added a link to parquet.thrift
##########
content/en/docs/Overview/_index.md:
##########
@@ -9,37 +9,40 @@ description: >
Apache Parquet is an open source, column-oriented data file format designed
for efficient data storage and retrieval.
It provides high performance compression and encoding schemes to handle
complex data in bulk and is supported in many programming language and
analytics tools.
-This documentation contains information about both the
[parquet-java](https://github.com/apache/parquet-java) and
[parquet-format](https://github.com/apache/parquet-format) repositories.
-### parquet-format
+### parquet-format (Specification)
-The parquet-format repository hosts the official specification of the Apache
Parquet file format, defining how data is structured and stored. This
specification, along with Thrift metadata definitions and other crucial
components, is essential for developers to effectively read and write Parquet
files. The parquet-format project specifically contains the format
specifications needed to understand and properly utilize Parquet files.
+The [parquet-format] repository hosts the official specification of the
Parquet file format, defining how data is structured and stored. This
specification, along with the [parquet.thrift] Thrift metadata definitions, is
necessary for developing software effectively read and write Parquet files.
-As a repository focused on specification, the parquet-format repository does
not contain source code.
+Note that the parquet-format repository does not contain source code for
libraries to read or write Parquet files, but rather the formal definitions and
documentation of the file format itself.
+[parquet-format]: https://github.com/apache/parquet-format
+[parquet.thrift]:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
### parquet-java
-The parquet-java (formerly named 'parquet-mr') repository is part of the
Apache Parquet project and specifically focuses on providing Java tools for
handling the Parquet file format. Essentially, this repository includes all the
necessary Java libraries and modules that allow developers to read and write
Apache Parquet files.
+The [parquet-java] (formerly named `parquet-mr`) repository is part of the
Apache Parquet project and contains:
+* Java libraries to read and write Parquet files in Java applications.
+* Utilities and APIs for working with Parquet files, including tools for data
import/export, schema management, and data conversion.
-The parquet-java repository contains an implementation of the Apache Parquet
format. There are a number of other Parquet format implementations, which are
listed below.
-
-Included in parquet-java:
-* Java Implementation: It contains the core Java implementation of the Apache
Parquet format, making it possible to use Parquet files in Java applications.
-
-* Utilities and APIs: It provides various utilities and APIs for working with
Apache Parquet files, including tools for data import/export, schema
management, and data conversion.
+Note that there are a number of other implementations of the Parquet format,
some of which are listed below.
+[parquet-java]: https://github.com/apache/parquet-java
### Other Clients / Libraries / Tools
The Parquet ecosystem is rich and varied, encompassing a wide array of tools,
libraries, and clients, each offering different levels of feature support. It's
important to note that not all implementations support the same features of the
Parquet format. When integrating multiple Parquet implementations within your
workflow, it is crucial to conduct thorough testing to ensure compatibility and
performance across different platforms and tools.
-Here is a non-exhaustive list of Parquet implementations:
+You can find more information about the feature support of various Parquet
implementations on the [implementation status] page.
+
+[implementation status]: /docs/file-format/implementationstatus
+
+Here is a non-exhaustive list of open source Parquet implementations:
* [Parquet-java](https://github.com/apache/parquet-java)
* [Parquet C++, a subproject of Arrow
C++](https://github.com/apache/arrow/tree/main/cpp/src/parquet)
([documentation](https://arrow.apache.org/docs/cpp/parquet.html))
-* [Parquet Go, a subproject for Arrow
Go](https://github.com/apache/arrow-go/tree/main/parquet)
([documentation](https://github.com/apache/arrow-go/tree/main/parquet))
-* [Parquet
Rust](https://github.com/apache/arrow-rs/blob/main/parquet/README.md)
+* [Parquet Go, a subproject of Arrow
Go](https://github.com/apache/arrow-go/tree/main/parquet)
([documentation](https://github.com/apache/arrow-go/tree/main/parquet))
Review Comment:
I renamed these to be consistent with Parquet C++
##########
content/en/docs/Overview/_index.md:
##########
@@ -9,37 +9,40 @@ description: >
Apache Parquet is an open source, column-oriented data file format designed
for efficient data storage and retrieval.
It provides high performance compression and encoding schemes to handle
complex data in bulk and is supported in many programming language and
analytics tools.
-This documentation contains information about both the
[parquet-java](https://github.com/apache/parquet-java) and
[parquet-format](https://github.com/apache/parquet-format) repositories.
-### parquet-format
+### parquet-format (Specification)
-The parquet-format repository hosts the official specification of the Apache
Parquet file format, defining how data is structured and stored. This
specification, along with Thrift metadata definitions and other crucial
components, is essential for developers to effectively read and write Parquet
files. The parquet-format project specifically contains the format
specifications needed to understand and properly utilize Parquet files.
+The [parquet-format] repository hosts the official specification of the
Parquet file format, defining how data is structured and stored. This
specification, along with the [parquet.thrift] Thrift metadata definitions, is
necessary for developing software effectively read and write Parquet files.
-As a repository focused on specification, the parquet-format repository does
not contain source code.
+Note that the parquet-format repository does not contain source code for
libraries to read or write Parquet files, but rather the formal definitions and
documentation of the file format itself.
+[parquet-format]: https://github.com/apache/parquet-format
+[parquet.thrift]:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
### parquet-java
-The parquet-java (formerly named 'parquet-mr') repository is part of the
Apache Parquet project and specifically focuses on providing Java tools for
handling the Parquet file format. Essentially, this repository includes all the
necessary Java libraries and modules that allow developers to read and write
Apache Parquet files.
+The [parquet-java] (formerly named `parquet-mr`) repository is part of the
Apache Parquet project and contains:
Review Comment:
I also made this more concise by removing much of the duplication
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]