[flink-web] 01/02: Update the blog date of 'Pandas support in PyFlink'

dianfu Tue, 04 Aug 2020 00:34:47 -0700

This is an automated email from the ASF dual-hosted git repository.

dianfu pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git


commit 16a18c7349d05cf59bbabd6bb89c709bf46482fa
Author: Dian Fu <[email protected]>
AuthorDate: Tue Aug 4 15:22:20 2020 +0800

    Update the blog date of 'Pandas support in PyFlink'
---
 ...nk.md => 2020-08-04-pyflink-pandas-udf-support-flink.md} |   8 ++++----
 content/2020/07/28/pyflink-pandas-udf-support-flink.html    |   6 +++---
 .../mission-of-pyFlink.gif                                  | Bin
 .../python-scientific-stack.png                             | Bin
 .../vm-communication.png                                    | Bin
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/_posts/2020-07-28-pyflink-pandas-udf-support-flink.md 
b/_posts/2020-08-04-pyflink-pandas-udf-support-flink.md
similarity index 98%
rename from _posts/2020-07-28-pyflink-pandas-udf-support-flink.md
rename to _posts/2020-08-04-pyflink-pandas-udf-support-flink.md
index f9871c5..ae0ee73 100644
--- a/_posts/2020-07-28-pyflink-pandas-udf-support-flink.md
+++ b/_posts/2020-08-04-pyflink-pandas-udf-support-flink.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title: "PyFlink: The integration of Pandas into PyFlink"
-date: 2020-07-28T12:00:00.000Z
+date: 2020-08-04T00:00:00.000Z
 authors:
 - Jincheng:
   name: "Jincheng Sun"
@@ -15,7 +15,7 @@ excerpt: The Apache Flink community put some great effort 
into integrating Panda
 Python has evolved into one of the most important programming languages for 
many fields of data processing. So big has been Python’s popularity, that it 
has pretty much become the default data processing language for data 
scientists. On top of that, there is a plethora of Python-based data processing 
tools such as NumPy, Pandas, and Scikit-learn that have gained additional 
popularity due to their flexibility or powerful functionalities. 
 
 <center>
-<img src="{{ site.baseurl 
}}/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png" 
width="450px" alt="Python Scientific Stack"/>
+<img src="{{ site.baseurl 
}}/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png" 
width="450px" alt="Python Scientific Stack"/>
 </center>
 <center>
   <a 
href="https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52";>Pic
 source: VanderPlas 2017, slide 52.</a>
@@ -50,7 +50,7 @@ While providing support for Python UDFs in PyFlink greatly 
improved the user exp
 The introduction of Pandas UDF is used to address these drawbacks. For Pandas 
UDF, a batch of rows is transferred between the JVM and PVM in a columnar 
format ([Arrow memory 
format](https://arrow.apache.org/docs/format/Columnar.html)). The batch of rows 
will be converted into a collection of Pandas Series and will be transferred to 
the Pandas UDF to then leverage popular Python libraries (such as Pandas, or 
NumPy) for the Python UDF implementation.
 
 <center>
-<img src="{{ site.baseurl 
}}/img/blog/2020-07-28-pyflink-pandas/vm-communication.png" width="550px" 
alt="VM Communication"/>
+<img src="{{ site.baseurl 
}}/img/blog/2020-08-04-pyflink-pandas/vm-communication.png" width="550px" 
alt="VM Communication"/>
 </center>
 
 
@@ -234,5 +234,5 @@ In this article, we introduce the integration of Pandas in 
Flink 1.11, including
 Future work by the community will focus on adding more features and bringing 
additional optimizations with follow up releases.  Such optimizations and 
additions include a Python DataStream API and more integration with the Python 
ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more 
information and updates with the upcoming releases!
 
 <center>
-<img src="{{ site.baseurl 
}}/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif" width="600px" 
alt="Mission of PyFlink"/>
+<img src="{{ site.baseurl 
}}/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif" width="600px" 
alt="Mission of PyFlink"/>
 </center>
diff --git a/content/2020/07/28/pyflink-pandas-udf-support-flink.html 
b/content/2020/07/28/pyflink-pandas-udf-support-flink.html
index 05b2e6e..99a170a 100644
--- a/content/2020/07/28/pyflink-pandas-udf-support-flink.html
+++ b/content/2020/07/28/pyflink-pandas-udf-support-flink.html
@@ -211,7 +211,7 @@
 <p>Python has evolved into one of the most important programming languages for 
many fields of data processing. So big has been Python’s popularity, that it 
has pretty much become the default data processing language for data 
scientists. On top of that, there is a plethora of Python-based data processing 
tools such as NumPy, Pandas, and Scikit-learn that have gained additional 
popularity due to their flexibility or powerful functionalities.</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png" 
width="450px" alt="Python Scientific Stack" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png" 
width="450px" alt="Python Scientific Stack" />
 </center>
 <center>
   <a 
href="https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52";>Pic
 source: VanderPlas 2017, slide 52.</a>
@@ -255,7 +255,7 @@ Currently, only Scalar Pandas UDFs are supported in 
PyFlink.</p>
 <p>The introduction of Pandas UDF is used to address these drawbacks. For 
Pandas UDF, a batch of rows is transferred between the JVM and PVM in a 
columnar format (<a 
href="https://arrow.apache.org/docs/format/Columnar.html";>Arrow memory 
format</a>). The batch of rows will be converted into a collection of Pandas 
Series and will be transferred to the Pandas UDF to then leverage popular 
Python libraries (such as Pandas, or NumPy) for the Python UDF 
implementation.</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/vm-communication.png" 
width="550px" alt="VM Communication" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/vm-communication.png" 
width="550px" alt="VM Communication" />
 </center>
 
 <p>The performance of vectorized UDFs is usually much higher when compared to 
the normal Python UDF, as the serialization/deserialization overhead is 
minimized by falling back to <a href="https://arrow.apache.org/";>Apache 
Arrow</a>, while handling <code>pandas.Series</code> as input/output allows us 
to take full advantage of the Pandas and NumPy libraries, making it a popular 
solution to parallelize Machine Learning and other large-scale, distributed 
data science workloads (e.g. feature  [...]
@@ -407,7 +407,7 @@ With the function, you can register and use it in the same 
way as the <a href="h
 <p>Future work by the community will focus on adding more features and 
bringing additional optimizations with follow up releases.  Such optimizations 
and additions include a Python DataStream API and more integration with the 
Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned 
for more information and updates with the upcoming releases!</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif" 
width="600px" alt="Mission of PyFlink" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif" 
width="600px" alt="Mission of PyFlink" />
 </center>
 
       </article>
diff --git a/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif 
b/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif
rename to img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif
diff --git a/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png 
b/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png
rename to img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png
diff --git a/img/blog/2020-07-28-pyflink-pandas/vm-communication.png 
b/img/blog/2020-08-04-pyflink-pandas/vm-communication.png
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/vm-communication.png
rename to img/blog/2020-08-04-pyflink-pandas/vm-communication.png

[flink-web] 01/02: Update the blog date of 'Pandas support in PyFlink'

Reply via email to