It is. But you have a third party library in here which seems to require a
different version.

On Mon, Aug 21, 2023, 7:04 PM Kal Stevens <kalgstev...@gmail.com> wrote:

> OK, it was my impression that scala was packaged with Spark to avoid a
> mismatch
> https://spark.apache.org/downloads.html
>
> It looks like spark 3.4.1 (my version) uses scala Scala 2.12
> How do I specify the scala version?
>
> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen <sro...@gmail.com> wrote:
>
>> That's a mismatch in the version of scala that your library uses vs spark
>> uses.
>>
>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens <kalgstev...@gmail.com> wrote:
>>
>>> I am having a hard time figuring out what I am doing wrong here.
>>> I am not sure if I have an incompatible version of something installed
>>> or something else.
>>> I can not find anything relevant in google to figure out what I am doing
>>> wrong
>>> I am using *spark 3.4.1*, and *python3.10*
>>>
>>> This is my code to save my dataframe
>>> urls = []
>>> pull_sitemap_xml(robot, urls)
>>> df = spark.createDataFrame(data=urls, schema=schema)
>>> df.write.format("org.apache.phoenix.spark") \
>>>     .mode("overwrite") \
>>>     .option("table", "property") \
>>>     .option("zkUrl", "192.168.1.162:2181") \
>>>     .save()
>>>
>>> urls is an array of maps, containing a "url" and a "last_mod" field.
>>>
>>> Here is the error that I am getting
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
>>> main
>>>
>>>     .save()
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>>> line 1396, in save
>>>
>>>     self._jwrite.save()
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
>>> line 1322, in __call__
>>>
>>>     return_value = get_return_value(
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
>>> line 169, in deco
>>>
>>>     return f(*a, **kw)
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
>>> line 326, in get_return_value
>>>
>>>     raise Py4JJavaError(
>>>
>>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>>>
>>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
>>> scala.Predef$.refArrayOps(java.lang.Object[])'
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>>>
>>> at
>>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>>>
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>>>
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>>>
>>

Reply via email to