Re: Time to start publishing Spark Docker Images?

Mich Talebzadeh Tue, 17 Aug 2021 12:29:31 -0700

Hi Andrew,

Can you please elaborate on blowing pip cache before committing the layer?


Thanks,

Much

On Tue, 17 Aug 2021 at 16:57, Andrew Melo <andrew.m...@gmail.com> wrote:

> Silly Q, did you blow away the pip cache before committing the layer? That
> always trips me up.
>
> Cheers
> Andrew
>
> On Tue, Aug 17, 2021 at 10:56 Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> With no additional python packages etc we get 1.4GB compared to 2.19GB
>> before
>>
>> REPOSITORY       TAG                                      IMAGE ID
>>  CREATED                  SIZE
>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8only   faee4dbb95dd
>>  Less than a second ago   1.41GB
>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8       ba3c17bc9337
>>  4 hours ago              2.19GB
>>
>> root@233a81199b43:/opt/spark/work-dir# pip list
>> Package       Version
>> ------------- -------
>> asn1crypto    0.24.0
>> cryptography  2.6.1
>> entrypoints   0.3
>> keyring       17.1.1
>> keyrings.alt  3.1.1
>> pip           21.2.4
>> pycrypto      2.6.1
>> PyGObject     3.30.4
>> pyxdg         0.25
>> SecretStorage 2.3.1
>> setuptools    57.4.0
>> six           1.12.0
>> wheel         0.32.3
>>
>>
>> HTH
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 17 Aug 2021 at 16:24, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Yes, I will double check. it includes java 8 in addition to base java 11.
>>>
>>> in addition it has these Python packages for now (added for my own needs
>>> for now)
>>>
>>> root@ce6773017a14:/opt/spark/work-dir# pip list
>>> Package       Version
>>> ------------- -------
>>> asn1crypto    0.24.0
>>> cryptography  2.6.1
>>> cx-Oracle     8.2.1
>>> entrypoints   0.3
>>> keyring       17.1.1
>>> keyrings.alt  3.1.1
>>> numpy         1.21.2
>>> pip           21.2.4
>>> py4j          0.10.9
>>> pycrypto      2.6.1
>>> PyGObject     3.30.4
>>> pyspark       3.1.2
>>> pyxdg         0.25
>>> PyYAML        5.4.1
>>> SecretStorage 2.3.1
>>> setuptools    57.4.0
>>> six           1.12.0
>>> wheel         0.32.3
>>>
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 17 Aug 2021 at 16:17, Maciej <mszymkiew...@gmail.com> wrote:
>>>
>>>> Quick question ‒ is this actual output? If so, do we know what accounts
>>>> 1.5GB overhead for PySpark image. Even without --no-install-recommends
>>>> this seems like a lot (if I recall correctly it was around 400MB for
>>>> existing images).
>>>>
>>>>
>>>> On 8/17/21 2:24 PM, Mich Talebzadeh wrote:
>>>>
>>>> Examples:
>>>>
>>>> *docker images*
>>>>
>>>> REPOSITORY       TAG                                  IMAGE ID
>>>>  CREATED          SIZE
>>>>
>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8   ba3c17bc9337   2
>>>> minutes ago    2.19GB
>>>>
>>>> spark            3.1.1-scala_2.12-java11              4595c4e78879   18
>>>> minutes ago   635MB
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 17 Aug 2021 at 10:31, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>
>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>> Yes let us go with that and remember that we can change the tags
>>>>> anytime. The accompanying release note should detail what is inside the
>>>>> image downloaded.
>>>>>
>>>>> +1 for me
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 17 Aug 2021 at 09:51, Maciej <mszymkiew...@gmail.com> wrote:
>>>>>
>>>>>> On 8/17/21 4:04 AM, Holden Karau wrote:
>>>>>>
>>>>>> These are some really good points all around.
>>>>>>
>>>>>> I think, in the interest of simplicity, well start with just the 3
>>>>>> current Dockerfiles in the Spark repo but for the next release (3.3) we
>>>>>> should explore adding some more Dockerfiles/build options.
>>>>>>
>>>>>> Sounds good.
>>>>>>
>>>>>> However, I'd consider adding guest lang version to the tag names, i.e.
>>>>>>
>>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>>
>>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>>>
>>>>>> and some basics safeguards in the layers, to make sure that these are
>>>>>> really the versions we use.
>>>>>>
>>>>>> On Mon, Aug 16, 2021 at 10:46 AM Maciej <mszymkiew...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have a few concerns regarding PySpark and SparkR images.
>>>>>>>
>>>>>>> First of all, how do we plan to handle interpreter versions?
>>>>>>> Ideally, we should provide images for all supported variants, but based 
>>>>>>> on
>>>>>>> the preceding discussion and the proposed naming convention, I assume 
>>>>>>> it is
>>>>>>> not going to happen. If that's the case, it would be great if we could 
>>>>>>> fix
>>>>>>> interpreter versions based on some support criteria (lowest supported,
>>>>>>> lowest non-deprecated, highest supported at the time of release, etc.)
>>>>>>>
>>>>>>> Currently, we use the following:
>>>>>>>
>>>>>>>    - for R use buster-cran35 Debian repositories which install R
>>>>>>>    3.6 (provided version already changed in the past and broke image 
>>>>>>> build ‒
>>>>>>>    SPARK-28606).
>>>>>>>    - for Python we depend on the system provided python3 packages,
>>>>>>>    which currently provides Python 3.7.
>>>>>>>
>>>>>>> which don't guarantee stability over time and might be hard to
>>>>>>> synchronize with our support matrix.
>>>>>>>
>>>>>>> Secondly, omitting libraries which are required for the full
>>>>>>> functionality and performance, specifically
>>>>>>>
>>>>>>>    - Numpy, Pandas and Arrow for PySpark
>>>>>>>    - Arrow for SparkR
>>>>>>>
>>>>>>> is likely to severely limit usability of the images (out of these,
>>>>>>> Arrow is probably the hardest to manage, especially when you already 
>>>>>>> depend
>>>>>>> on system packages to provide R or Python interpreter).
>>>>>>>
>>>>>>> On 8/14/21 12:43 AM, Mich Talebzadeh wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We can cater for multiple types (spark, spark-py and spark-r) and
>>>>>>> spark versions (assuming they are downloaded and available).
>>>>>>> The challenge is that these docker images built are snapshots. They
>>>>>>> cannot be amended later and if you change anything by going inside 
>>>>>>> docker,
>>>>>>> as soon as you are logged out whatever you did is reversed.
>>>>>>>
>>>>>>> For example, I want to add tensorflow to my docker image. These are
>>>>>>> my images
>>>>>>>
>>>>>>> REPOSITORY                                TAG           IMAGE ID
>>>>>>>    CREATED         SIZE
>>>>>>> eu.gcr.io/axial-glow-224522/spark-py      java8_3.1.1
>>>>>>>  cfbb0e69f204   5 days ago      2.37GB
>>>>>>> eu.gcr.io/axial-glow-224522/spark         3.1.1
>>>>>>>  8d1bf8e7e47d   5 days ago      805MB
>>>>>>>
>>>>>>> using image ID I try to log in as root to the image
>>>>>>>
>>>>>>> *docker run -u0 -it cfbb0e69f204 bash*
>>>>>>>
>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip install keras
>>>>>>> Collecting keras
>>>>>>>   Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
>>>>>>>      |████████████████████████████████| 1.3 MB 1.1 MB/s
>>>>>>> Installing collected packages: keras
>>>>>>> Successfully installed keras-2.6.0
>>>>>>> WARNING: Running pip as the 'root' user can result in broken
>>>>>>> permissions and conflicting behaviour with the system package manager. 
>>>>>>> It
>>>>>>> is recommended to use a virtual environment instead:
>>>>>>> https://pip.pypa.io/warnings/venv
>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip list
>>>>>>> Package       Version
>>>>>>> ------------- -------
>>>>>>> asn1crypto    0.24.0
>>>>>>> cryptography  2.6.1
>>>>>>> cx-Oracle     8.2.1
>>>>>>> entrypoints   0.3
>>>>>>> *keras         2.6.0      <--- it is here*
>>>>>>> keyring       17.1.1
>>>>>>> keyrings.alt  3.1.1
>>>>>>> numpy         1.21.1
>>>>>>> pip           21.2.3
>>>>>>> py4j          0.10.9
>>>>>>> pycrypto      2.6.1
>>>>>>> PyGObject     3.30.4
>>>>>>> pyspark       3.1.2
>>>>>>> pyxdg         0.25
>>>>>>> PyYAML        5.4.1
>>>>>>> SecretStorage 2.3.1
>>>>>>> setuptools    57.4.0
>>>>>>> six           1.12.0
>>>>>>> wheel         0.32.3
>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# exit
>>>>>>>
>>>>>>> Now I exited from the image and try to log in again
>>>>>>> (pyspark_venv) hduser@rhes76: /home/hduser/dba/bin/build> docker
>>>>>>> run -u0 -it cfbb0e69f204 bash
>>>>>>>
>>>>>>> root@5231ee95aa83:/opt/spark/work-dir# pip list
>>>>>>> Package       Version
>>>>>>> ------------- -------
>>>>>>> asn1crypto    0.24.0
>>>>>>> cryptography  2.6.1
>>>>>>> cx-Oracle     8.2.1
>>>>>>> entrypoints   0.3
>>>>>>> keyring       17.1.1
>>>>>>> keyrings.alt  3.1.1
>>>>>>> numpy         1.21.1
>>>>>>> pip           21.2.3
>>>>>>> py4j          0.10.9
>>>>>>> pycrypto      2.6.1
>>>>>>> PyGObject     3.30.4
>>>>>>> pyspark       3.1.2
>>>>>>> pyxdg         0.25
>>>>>>> PyYAML        5.4.1
>>>>>>> SecretStorage 2.3.1
>>>>>>> setuptools    57.4.0
>>>>>>> six           1.12.0
>>>>>>> wheel         0.32.3
>>>>>>>
>>>>>>> *Hm that keras is not there*. The docker Image cannot be altered
>>>>>>> after build! So once the docker image is created that is just a 
>>>>>>> snapshot.
>>>>>>> However, it will still have tons of useful stuff for most
>>>>>>> users/organisations. My suggestions is to create for a given type 
>>>>>>> (spark,
>>>>>>> spark-py etc):
>>>>>>>
>>>>>>>
>>>>>>>    1. One vanilla flavour for everyday use with few useful packages
>>>>>>>    2. One for medium use with most common packages for ETL/ELT
>>>>>>>    stuff
>>>>>>>    3. One specialist for ML etc with keras, tensorflow and anything
>>>>>>>    else needed
>>>>>>>
>>>>>>>
>>>>>>> These images should be maintained as we currently maintain spark
>>>>>>> releases with accompanying documentation. Any reason why we cannot 
>>>>>>> maintain
>>>>>>> ourselves?
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 13 Aug 2021 at 17:26, Holden Karau <hol...@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> So we actually do have a script that does the build already it's
>>>>>>>> more a matter of publishing the results for easier use. Currently the
>>>>>>>> script produces three images spark, spark-py, and spark-r. I can 
>>>>>>>> certainly
>>>>>>>> see a solid reason to publish like with a jdk11 & jdk8 suffix as well 
>>>>>>>> if
>>>>>>>> there is interest in the community. If we want to have a say
>>>>>>>> spark-py-pandas for a Spark container image with everything necessary 
>>>>>>>> for
>>>>>>>> the Koalas stuff to work then I think that could be a great PR from 
>>>>>>>> someone
>>>>>>>> to add :)
>>>>>>>>
>>>>>>>> On Fri, Aug 13, 2021 at 1:00 AM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> should read PySpark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>>> which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>>> damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Agreed.
>>>>>>>>>>
>>>>>>>>>> I have already built a few latest for Spark and PYSpark on 3.1.1
>>>>>>>>>> with Java 8 as I found out Java 11 does not work with Google 
>>>>>>>>>> BigQuery data
>>>>>>>>>> warehouse. However, to hack the Dockerfile one finds out the hard 
>>>>>>>>>> way.
>>>>>>>>>>
>>>>>>>>>> For example how to add additional Python libraries like
>>>>>>>>>> tensorflow etc. Loading these libraries through Kubernetes is not 
>>>>>>>>>> practical
>>>>>>>>>> as unzipping and installing it through --py-files etc will
>>>>>>>>>> take considerable time so they need to be added to the dockerfile at 
>>>>>>>>>> the
>>>>>>>>>> built time in directory for Python under Kubernetes
>>>>>>>>>>
>>>>>>>>>> /opt/spark/kubernetes/dockerfiles/spark/bindings/python
>>>>>>>>>>
>>>>>>>>>> RUN pip install pyyaml numpy cx_Oracle tensorflow ....
>>>>>>>>>>
>>>>>>>>>> Also you will need curl to test the ports from inside the docker
>>>>>>>>>>
>>>>>>>>>> RUN apt-get update && apt-get install -y curl
>>>>>>>>>> RUN ["apt-get","install","-y","vim"]
>>>>>>>>>>
>>>>>>>>>> As I said I am happy to build these specific dockerfiles plus the
>>>>>>>>>> complete documentation for it. I have already built one for Google 
>>>>>>>>>> (GCP).
>>>>>>>>>> The difference between Spark and PySpark version is that in 
>>>>>>>>>> Spark/scala a
>>>>>>>>>> fat jar file will contain all needed. That is not the case with 
>>>>>>>>>> Python I am
>>>>>>>>>> afraid.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD <
>>>>>>>>>> meikel.b...@bertelsmann.de> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I am Meikel Bode and only an interested reader of dev and user
>>>>>>>>>>> list. Anyway, I would appreciate to have official docker images 
>>>>>>>>>>> available.
>>>>>>>>>>>
>>>>>>>>>>> Maybe one could get inspiration from the Jupyter docker stacks
>>>>>>>>>>> and provide an hierarchy of different images like this:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Having a core image only supporting Java, an extended supporting
>>>>>>>>>>> Python and/or R etc.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking forward to the discussion.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Meikel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>>>>>>> *Sent:* Freitag, 13. August 2021 08:45
>>>>>>>>>>> *Cc:* dev <dev@spark.apache.org>
>>>>>>>>>>> *Subject:* Re: Time to start publishing Spark Docker Images?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I concur this is a good idea and certainly worth exploring.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In practice, preparing docker images as deployable will throw
>>>>>>>>>>> some challenges because creating docker for Spark  is not really a 
>>>>>>>>>>> singular
>>>>>>>>>>> modular unit, say  creating docker for Jenkins. It involves 
>>>>>>>>>>> different
>>>>>>>>>>> versions and different images for Spark and PySpark and most likely 
>>>>>>>>>>> will
>>>>>>>>>>> end up as part of Kubernetes deployment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Individuals and organisations will deploy it as the first cut.
>>>>>>>>>>> Great but I equally feel that good documentation on how to build a
>>>>>>>>>>> consumable deployable image will be more valuable.  FRom my own 
>>>>>>>>>>> experience
>>>>>>>>>>> the current documentation should be enhanced, for example how to 
>>>>>>>>>>> deploy
>>>>>>>>>>> working directories, additional Python packages, build with 
>>>>>>>>>>> different Java
>>>>>>>>>>> versions  (version 8 or version 11) etc.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>>> other
>>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>>> content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <hol...@pigscanfly.ca>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Awesome, I've filed an INFRA ticket to get the ball rolling.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <jzh...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <
>>>>>>>>>>> gurwls...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> +1, I think we generally agreed upon having it. Thanks Holden
>>>>>>>>>>> for headsup and driving this.
>>>>>>>>>>>
>>>>>>>>>>> +@Dongjoon Hyun <dongj...@apache.org> FYI
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooq...@gmail.com>님이 작성:
>>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Bests,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Kent Yao*
>>>>>>>>>>>
>>>>>>>>>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>>>>>>>>>>
>>>>>>>>>>> *a spark* *enthusiast*
>>>>>>>>>>>
>>>>>>>>>>> *kyuubi
>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is
>>>>>>>>>>> a unified* *multi-tenant* *JDBC interface for large-scale data
>>>>>>>>>>> processing and analytics,* *built on top of* *Apache Spark
>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>>>>>>>>>>> *.*
>>>>>>>>>>> *spark-authorizer
>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A
>>>>>>>>>>> Spark SQL extension which provides SQL Standard Authorization for*
>>>>>>>>>>>
>>>>>>>>>>> --
> It's dark in this basement.
>
-- 



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Time to start publishing Spark Docker Images?

Reply via email to