[arrow-site] branch master updated: [MINOR] Improve formatting of json examples + links in nested parquet blogs (#256)

alamb Mon, 17 Oct 2022 14:27:00 -0700

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git



The following commit(s) were added to refs/heads/master by this push:
     new be4e5fc04c [MINOR] Improve formatting of json examples + links in 
nested parquet blogs (#256)
be4e5fc04c is described below

commit be4e5fc04cf908b4f3c1251b6bf84b47b13f6f10
Author: Andrew Lamb <[email protected]>
AuthorDate: Mon Oct 17 17:26:48 2022 -0400

    [MINOR] Improve formatting of json examples + links in nested parquet blogs 
(#256)
    
    * [MINOR]: Improve diagram markdown formatting
    
    * Tweak markdown and add links
    
    * Tweak
---
 _posts/2022-10-05-arrow-parquet-encoding-part-1.md |  6 +--
 _posts/2022-10-08-arrow-parquet-encoding-part-2.md | 62 +++++++++++-----------
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/_posts/2022-10-05-arrow-parquet-encoding-part-1.md 
b/_posts/2022-10-05-arrow-parquet-encoding-part-1.md
index 33d791ad92..a4688a7af8 100644
--- a/_posts/2022-10-05-arrow-parquet-encoding-part-1.md
+++ b/_posts/2022-10-05-arrow-parquet-encoding-part-1.md
@@ -44,7 +44,7 @@ First, it is necessary to take a step back and discuss the 
difference between co
 
 For example
 
-```json
+```python
 {"Column1": 1, "Column2": 2}
 {"Column1": 3, "Column2": 4, "Column3": 5}
 {"Column1": 5, "Column2": 4, "Column3": 5}
@@ -52,7 +52,7 @@ For example
 
 In a columnar representation, the data for a given column is instead stored 
contiguously
 
-```text
+```python
 Column1: [1, 3, 5]
 Column2: [2, 4, 4]
 Column3: [null, 5, 5]
@@ -147,4 +147,4 @@ Definition  Values
 
 ## Next up: Nested and Hierarchical Data
 
-Armed with the foundational understanding of how Arrow and Parquet store 
nullability / definition differently we are ready to move on to more complex 
nested types, which you can read about in our upcoming blog post on the topic 
<!-- I propose to update this text with a link when when we have published the 
next blog -->.
+Armed with the foundational understanding of how Arrow and Parquet store 
nullability / definition differently we are ready to move on to more complex 
nested types, which you can read about in our [next blog post on the 
topic](https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/).
diff --git a/_posts/2022-10-08-arrow-parquet-encoding-part-2.md 
b/_posts/2022-10-08-arrow-parquet-encoding-part-2.md
index f871c68530..c88b13eb25 100644
--- a/_posts/2022-10-08-arrow-parquet-encoding-part-2.md
+++ b/_posts/2022-10-08-arrow-parquet-encoding-part-2.md
@@ -37,25 +37,25 @@ Both Parquet and Arrow have the concept of a *struct* 
column, which is a column
 
 For example, consider the following three JSON documents
 
-```json
-{              <-- First record
-  "a": 1,      <-- the top level fields are a, b, c, and d
-  "b": {       <-- b is always provided (not nullable)
-    "b1": 1,   <-- b1 and b2 are "nested" fields of "b"
-    "b2": 3    <-- b2 is always provided (not nullable)
+```python
+{              # <-- First record
+  "a": 1,      # <-- the top level fields are a, b, c, and d
+  "b": {       # <-- b is always provided (not nullable)
+    "b1": 1,   # <-- b1 and b2 are "nested" fields of "b"
+    "b2": 3    # <-- b2 is always provided (not nullable)
    },
  "d": {
-   "d1":  1    <-- d1 is a "nested" field of "d"
+   "d1":  1    # <-- d1 is a "nested" field of "d"
   }
 }
 ```
-```json
-{              <-- Second record
+```python
+{              # <-- Second record
   "a": 2,
   "b": {
-    "b2": 4    <-- note "b1" is NULL in this record
+    "b2": 4    # <-- note "b1" is NULL in this record
   },
-  "c": {       <-- note "c" was NULL in the first record
+  "c": {       # <-- note "c" was NULL in the first record
     "c1": 6        but when "c" is provided, c1 is also
   },               always provided (not nullable)
   "d": {
@@ -64,8 +64,8 @@ For example, consider the following three JSON documents
   }
 }
 ```
-```json
-{              <-- Third record
+```python
+{              # <-- Third record
   "b": {
     "b1": 5,
     "b2": 6
@@ -77,7 +77,7 @@ For example, consider the following three JSON documents
 ```
 Documents of this format could be stored in an Arrow `StructArray` with this 
schema
 
-```text
+```python
 Field(name: "a", nullable: true, datatype: Int32)
 Field(name: "b", nullable: false, datatype: Struct[
   Field(name: "b1", nullable: true, datatype: Int32),
@@ -144,14 +144,14 @@ For example consider the case of `d.d2`, which contains 
two nullable levels `d`
 
 A definition level of `0` would imply a null at the level of `d`:
 
-```json
+```python
 {
 }
 ```
 
 A definition level of `1` would imply a null at the level of `d`
 
-```json
+```python
 {
   "d": { null }
 }
@@ -159,7 +159,7 @@ A definition level of `1` would imply a null at the level 
of `d`
 
 A definition level of `2` would imply a defined value for `d.d2`:
 
-```json
+```python
 {
   "d": { "d2": .. }
 }
@@ -168,7 +168,7 @@ A definition level of `2` would imply a defined value for 
`d.d2`:
 
 Going back to the three JSON documents above, they could be stored in Parquet 
with this schema
 
-```text
+```python
 message schema {
   optional int32 a;
   required group b {
@@ -230,29 +230,29 @@ The Parquet encoding of the example would be:
 
 Closing out support for nested types are *lists*, which contain a variable 
number of other values. For example, the following four documents each have a 
(nullable) field `a` containing a list of integers
 
-```json
-{                     <-- First record
-  "a": [1],           <-- top-level field a containing list of integers
+```python
+{                     # <-- First record
+  "a": [1],           # <-- top-level field a containing list of integers
 }
 ```
-```json
-{                     <-- "a" is not provided (is null)
+```python
+{                     # <-- "a" is not provided (is null)
 }
 ```
-```json
-{                     <-- "a" is non-null but empty
+```python
+{                     # <-- "a" is non-null but empty
   "a": []
 }
 ```
-```json
+```python
 {
-  "a": [null, 2],     <-- "a" has a null and non-null elements
+  "a": [null, 2],     # <-- "a" has a null and non-null elements
 }
 ```
 
 Documents of this format could be stored in this Arrow schema
 
-```text
+```python
 Field(name: "a", nullable: true, datatype: List(
   Field(name: "element", nullable: true, datatype: Int32),
 )
@@ -262,7 +262,7 @@ As before, Arrow chooses to represent this in a 
hierarchical fashion as a `ListA
 
 For example, a list with offsets `[0, 2, 3, 3]` contains 3 pairs of offsets, 
`(0,2)`, `(2,3)`, and `(3,3)`, and therefore represents a `ListArray` of length 
3 with the following values:
 
-```text
+```python
 0: [child[0], child[1]]
 1: []
 2: [child[2]]
@@ -299,7 +299,7 @@ More technical detail is available in the [ListArray format 
specification](https
 
 The example above with 4 JSON documents can be stored in this Parquet schema
 
-```text
+```python
 message schema {
   optional group a (LIST) {
     repeated group list {
@@ -343,6 +343,6 @@ The example above would therefore be encoded as
 
 ## Next up: Arbitrary Nesting: Lists of Structs and Structs of Lists
 
-In our final blog post <!-- When published, add link here --> we will explain 
how Parquet and Arrow combine these concepts to support arbitrary nesting of 
potentially nullable data structures.
+In our [final blog 
post](https://arrow.apache.org/blog/2022/10/17/arrow-parquet-encoding-part-3/), 
we explain how Parquet and Arrow combine these concepts to support arbitrary 
nesting of potentially nullable data structures.
 
 If you want to store and process structured types, you will be pleased to hear 
that the Rust [parquet](https://crates.io/crates/parquet) implementation fully 
supports reading and writing directly into Arrow, as simply as any other type. 
All the complex record shredding and reconstruction is handled automatically. 
With this and other exciting features such as  [reading 
asynchronously](https://docs.rs/parquet/22.0.0/parquet/arrow/async_reader/index.html)
 from [object storage](https://docs. [...]

[arrow-site] branch master updated: [MINOR] Improve formatting of json examples + links in nested parquet blogs (#256)

Reply via email to