should read PySpark


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Agreed.
>
> I have already built a few latest for Spark and PYSpark on 3.1.1 with Java
> 8 as I found out Java 11 does not work with Google BigQuery data warehouse.
> However, to hack the Dockerfile one finds out the hard way.
>
> For example how to add additional Python libraries like tensorflow etc.
> Loading these libraries through Kubernetes is not practical as unzipping
> and installing it through --py-files etc will take considerable time so
> they need to be added to the dockerfile at the built time in directory for
> Python under Kubernetes
>
> /opt/spark/kubernetes/dockerfiles/spark/bindings/python
>
> RUN pip install pyyaml numpy cx_Oracle tensorflow ....
>
> Also you will need curl to test the ports from inside the docker
>
> RUN apt-get update && apt-get install -y curl
> RUN ["apt-get","install","-y","vim"]
>
> As I said I am happy to build these specific dockerfiles plus the complete
> documentation for it. I have already built one for Google (GCP). The
> difference between Spark and PySpark version is that in Spark/scala a fat
> jar file will contain all needed. That is not the case with Python I am
> afraid.
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD <
> meikel.b...@bertelsmann.de> wrote:
>
>> Hi all,
>>
>>
>>
>> I am Meikel Bode and only an interested reader of dev and user list.
>> Anyway, I would appreciate to have official docker images available.
>>
>> Maybe one could get inspiration from the Jupyter docker stacks and
>> provide an hierarchy of different images like this:
>>
>>
>>
>>
>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships
>>
>>
>>
>> Having a core image only supporting Java, an extended supporting Python
>> and/or R etc.
>>
>>
>>
>> Looking forward to the discussion.
>>
>>
>>
>> Best,
>>
>> Meikel
>>
>>
>>
>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>> *Sent:* Freitag, 13. August 2021 08:45
>> *Cc:* dev <dev@spark.apache.org>
>> *Subject:* Re: Time to start publishing Spark Docker Images?
>>
>>
>>
>> I concur this is a good idea and certainly worth exploring.
>>
>>
>>
>> In practice, preparing docker images as deployable will throw some
>> challenges because creating docker for Spark  is not really a singular
>> modular unit, say  creating docker for Jenkins. It involves different
>> versions and different images for Spark and PySpark and most likely will
>> end up as part of Kubernetes deployment.
>>
>>
>>
>> Individuals and organisations will deploy it as the first cut. Great but
>> I equally feel that good documentation on how to build a consumable
>> deployable image will be more valuable.  FRom my own experience the current
>> documentation should be enhanced, for example how to deploy working
>> directories, additional Python packages, build with different Java
>> versions  (version 8 or version 11) etc.
>>
>>
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>> Awesome, I've filed an INFRA ticket to get the ball rolling.
>>
>>
>>
>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <jzh...@apache.org> wrote:
>>
>> +1
>>
>>
>>
>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>
>> +1, I think we generally agreed upon having it. Thanks Holden for headsup
>> and driving this.
>>
>> +@Dongjoon Hyun <dongj...@apache.org> FYI
>>
>>
>>
>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooq...@gmail.com>님이 작성:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>>
>>
>> *Kent Yao*
>>
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>
>> *a spark* *enthusiast*
>>
>> *kyuubi
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is
>> a unified* *multi-tenant* *JDBC interface for large-scale data
>> processing and analytics,* *built on top of* *Apache Spark
>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>> *.*
>> *spark-authorizer
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A
>> Spark SQL extension which provides SQL Standard Authorization for* *Apache
>> Spark
>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>> *.*
>> *spark-postgres
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-postgres&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cCM9mLZBaZTF4WYzm22eIf4CU%2FfiiWCD0FUSfXSmaJA%3D&reserved=0>
>>  **A
>> library for reading data from and transferring data to Postgres / Greenplum
>> with Spark SQL and DataFrames, 10~100x faster.*
>> *itatchi
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-func-extras&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sEhn0HXSzsPSBKhXZlzwQErwwFtcTdTYFqeG9FVpROU%3D&reserved=0>**A
>> library* *that brings useful functions from various modern database
>> management systems to* *Apache Spark
>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZIdOvEDv%2FDZWYAB3Bnm4cD1YBVVl3aaHjLiz1HSDsY0%3D&reserved=0>*
>> *.*
>>
>>
>>
>>
>>
>> On 07/22/2021 11:13,Holden Karau<hol...@pigscanfly.ca>
>> <hol...@pigscanfly.ca> wrote:
>>
>> Hi Folks,
>>
>>
>>
>> Many other distributed computing (https://hub.docker.com/r/rayproject/ray
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Frayproject%2Fray&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F%2BPp69I10cyEeSTp6POoNZObOpkkzcZfB35vcdkR8P8%3D&reserved=0>
>> https://hub.docker.com/u/daskdev
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fdaskdev&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jrQU9WbtFLM1T71SVaZwa0U57F8GcBSFHmXiauQtou0%3D&reserved=0>)
>> and ASF projects (https://hub.docker.com/u/apache
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fapache&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yD8NWSYhhL6%2BDb3D%2BfD%2F8ynKAL4Wp8BKDMHV0n7jHHM%3D&reserved=0>)
>> now publish their images to dockerhub.
>>
>>
>>
>> We've already got the docker image tooling in place, I think we'd need to
>> ask the ASF to grant permissions to the PMC to publish containers and
>> update the release steps but I think this could be useful for folks.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Holden
>>
>>
>>
>> --
>>
>> Twitter: https://twitter.com/holdenkarau
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4qhg1CzKNiiRZkbvzKMp7WL4BoYLzPZ%2FOpFwHu8KNmg%3D&reserved=0>
>>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5UCR1Qn0fLovLAdTFnJBnLYF3e2NRnL8wEYPhCfLf2A%3D&reserved=0>
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LbsZdvDNTAc804N2dknen%2BoJavleIsh5vwpNaj7xIio%3D&reserved=0>
>>
>> --------------------------------------------------------------------- To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
>>
>> John Zhuge
>>
>>
>>
>>
>> --
>>
>> Twitter: https://twitter.com/holdenkarau
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x6fXgTuoQqVYqu9JPbt0hG2P0zl6l3p%2FrU5bDng85AY%3D&reserved=0>
>>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WCHuF%2BcEl0rBZyVOePRQT1AOefwRDlIavu9B0wDmmOk%3D&reserved=0>
>>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790739490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=52hSM52z%2FFRahVO%2FcRwJ6eDuDInvhhtt1xQfbhMRazQ%3D&reserved=0>
>>
>>

Reply via email to