Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Qian Sun
My understanding is that we don’t need to do anything. Log4j2-core not used in 
spark.

> 2021年12月13日 下午12:45,Pralabh Kumar  写道:
> 
> Hi developers,  users 
> 
> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on 
> recent CVE detected ?
> 
> 
> Regards
> Pralabh kumar


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
You would want to shade this dependency in your app, in which case you
would be using log4j 2. If you don't shade and just include it, you will
also be using log4j 2 as some of the API classes are different. If they
overlap with log4j 1, you will probably hit errors anyway.

On Mon, Dec 13, 2021 at 6:33 PM James Yu  wrote:

> Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x
> and gets submitted to the Spark cluster.  Which version of log4j gets
> actually used during the Spark session?
> --
> *From:* Sean Owen 
> *Sent:* Monday, December 13, 2021 8:25 AM
> *To:* Jörn Franke 
> *Cc:* Pralabh Kumar ; dev ;
> user.spark 
> *Subject:* Re: Log4j 1.2.17 spark CVE
>
> This has come up several times over years - search JIRA. The very short
> summary is: Spark does not use log4j 1.x, but its dependencies do, and
> that's the issue.
> Anyone that can successfully complete the surgery at this point is welcome
> to, but I failed ~2 years ago.
>
> On Mon, Dec 13, 2021 at 10:02 AM Jörn Franke  wrote:
>
> Is it in any case appropriate to use log4j 1.x which is not maintained
> anymore and has other security vulnerabilities which won’t be fixed anymore
> ?
>
> Am 13.12.2021 um 06:06 schrieb Sean Owen :
>
> 
> Check the CVE - the log4j vulnerability appears to affect log4j 2, not
> 1.x. There was mention that it could affect 1.x when used with JNDI or SMS
> handlers, but Spark does neither. (unless anyone can think of something I'm
> missing, but never heard or seen that come up at all in 7 years in Spark)
>
> The big issue would be applications that themselves configure log4j 2.x,
> but that's not a Spark issue per se.
>
> On Sun, Dec 12, 2021 at 10:46 PM Pralabh Kumar 
> wrote:
>
> Hi developers,  users
>
> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on
> recent CVE detected ?
>
>
> Regards
> Pralabh kumar
>
>


Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread James Yu
Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x and 
gets submitted to the Spark cluster.  Which version of log4j gets actually used 
during the Spark session?

From: Sean Owen 
Sent: Monday, December 13, 2021 8:25 AM
To: Jörn Franke 
Cc: Pralabh Kumar ; dev ; 
user.spark 
Subject: Re: Log4j 1.2.17 spark CVE

This has come up several times over years - search JIRA. The very short summary 
is: Spark does not use log4j 1.x, but its dependencies do, and that's the issue.
Anyone that can successfully complete the surgery at this point is welcome to, 
but I failed ~2 years ago.

On Mon, Dec 13, 2021 at 10:02 AM Jörn Franke 
mailto:jornfra...@gmail.com>> wrote:
Is it in any case appropriate to use log4j 1.x which is not maintained anymore 
and has other security vulnerabilities which won’t be fixed anymore ?

Am 13.12.2021 um 06:06 schrieb Sean Owen 
mailto:sro...@gmail.com>>:


Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. 
There was mention that it could affect 1.x when used with JNDI or SMS handlers, 
but Spark does neither. (unless anyone can think of something I'm missing, but 
never heard or seen that come up at all in 7 years in Spark)

The big issue would be applications that themselves configure log4j 2.x, but 
that's not a Spark issue per se.

On Sun, Dec 12, 2021 at 10:46 PM Pralabh Kumar 
mailto:pralabhku...@gmail.com>> wrote:
Hi developers,  users

Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on recent 
CVE detected ?


Regards
Pralabh kumar


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
You are correct, I understand. My only concern is the back compatibility
problem, which worked for the previous version of Apache Spark. It's
painful when an OOTB feature breaks without documentation or a workaround
like "spark.sql.legacy.keepSqlRecursive" true/false. It's not about "my
code", it is about all production code running out there.

Thank you so much

On Mon, Dec 13, 2021 at 2:32 PM Sean Owen  wrote:

> I think we're around in circles - you should not do this. You essentially
> have "__TABLE__ = SELECT * FROM __TABLE__" and I hope it's clear why that
> can't work in general.
> At first execution, sure, maybe "old" __TABLE__ refers to "SELECT 1", but
> what about the second time? if you stick to that interpretation, it's
> actually not executing correctly, though 'works'. If you execute it as is,
> it fails for circularity. Both are bad, so it's just disallowed.
> Just fix your code?
>
> On Mon, Dec 13, 2021 at 11:27 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> I've reduced the code to reproduce the issue,
>>
>> val df = spark.sql("SELECT 1")
>> df.createOrReplaceTempView("__TABLE__")
>> spark.sql("SELECT * FROM __TABLE__").show
>> val df2 = spark.sql("SELECT *,2 FROM __TABLE__")
>> df2.createOrReplaceTempView("__TABLE__") // Exception in Spark 3.2 but
>> works for Spark 2.4.x and Spark 3.1.x
>> spark.sql("SELECT * FROM __TABLE__").show
>>
>> org.apache.spark.sql.AnalysisException: Recursive view `__TABLE__`
>> detected (cycle: `__TABLE__` -> `__TABLE__`)
>>   at
>> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>>   at
>> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>>   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>>   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>>
>> On Mon, Dec 13, 2021 at 2:10 PM Sean Owen  wrote:
>>
>>> _shrug_ I think this is a bug fix, unless I am missing something here.
>>> You shouldn't just use __TABLE__ for everything, and I'm not seeing a good
>>> reason to do that other than it's what you do now.
>>> I'm not clear if it's coming across that this _can't_ work in the
>>> general case.
>>>
>>> On Mon, Dec 13, 2021 at 11:03 AM Daniel de Oliveira Mantovani <
>>> daniel.oliveira.mantov...@gmail.com> wrote:
>>>

 In this context, I don't want to worry about the name of the temporary
 table. That's why it is "__TABLE__".
 The point is that this behavior for Spark 3.2.x it's breaking back
 compatibility for all previous versions of Apache Spark. In my opinion we
 should at least have some flag like "spark.sql.legacy.keepSqlRecursive"
 true/false.

>>>
>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

-- 

--
Daniel Mantovani


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
I think we're around in circles - you should not do this. You essentially
have "__TABLE__ = SELECT * FROM __TABLE__" and I hope it's clear why that
can't work in general.
At first execution, sure, maybe "old" __TABLE__ refers to "SELECT 1", but
what about the second time? if you stick to that interpretation, it's
actually not executing correctly, though 'works'. If you execute it as is,
it fails for circularity. Both are bad, so it's just disallowed.
Just fix your code?

On Mon, Dec 13, 2021 at 11:27 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> I've reduced the code to reproduce the issue,
>
> val df = spark.sql("SELECT 1")
> df.createOrReplaceTempView("__TABLE__")
> spark.sql("SELECT * FROM __TABLE__").show
> val df2 = spark.sql("SELECT *,2 FROM __TABLE__")
> df2.createOrReplaceTempView("__TABLE__") // Exception in Spark 3.2 but
> works for Spark 2.4.x and Spark 3.1.x
> spark.sql("SELECT * FROM __TABLE__").show
>
> org.apache.spark.sql.AnalysisException: Recursive view `__TABLE__`
> detected (cycle: `__TABLE__` -> `__TABLE__`)
>   at
> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>   at
> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>
> On Mon, Dec 13, 2021 at 2:10 PM Sean Owen  wrote:
>
>> _shrug_ I think this is a bug fix, unless I am missing something here.
>> You shouldn't just use __TABLE__ for everything, and I'm not seeing a good
>> reason to do that other than it's what you do now.
>> I'm not clear if it's coming across that this _can't_ work in the general
>> case.
>>
>> On Mon, Dec 13, 2021 at 11:03 AM Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>>
>>> In this context, I don't want to worry about the name of the temporary
>>> table. That's why it is "__TABLE__".
>>> The point is that this behavior for Spark 3.2.x it's breaking back
>>> compatibility for all previous versions of Apache Spark. In my opinion we
>>> should at least have some flag like "spark.sql.legacy.keepSqlRecursive"
>>> true/false.
>>>
>>
>
> --
>
> --
> Daniel Mantovani
>
>


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
I've reduced the code to reproduce the issue,

val df = spark.sql("SELECT 1")
df.createOrReplaceTempView("__TABLE__")
spark.sql("SELECT * FROM __TABLE__").show
val df2 = spark.sql("SELECT *,2 FROM __TABLE__")
df2.createOrReplaceTempView("__TABLE__") // Exception in Spark 3.2 but
works for Spark 2.4.x and Spark 3.1.x
spark.sql("SELECT * FROM __TABLE__").show

org.apache.spark.sql.AnalysisException: Recursive view `__TABLE__` detected
(cycle: `__TABLE__` -> `__TABLE__`)
  at
org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
  at
org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
  at
org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
  at
org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)

On Mon, Dec 13, 2021 at 2:10 PM Sean Owen  wrote:

> _shrug_ I think this is a bug fix, unless I am missing something here. You
> shouldn't just use __TABLE__ for everything, and I'm not seeing a good
> reason to do that other than it's what you do now.
> I'm not clear if it's coming across that this _can't_ work in the general
> case.
>
> On Mon, Dec 13, 2021 at 11:03 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>>
>> In this context, I don't want to worry about the name of the temporary
>> table. That's why it is "__TABLE__".
>> The point is that this behavior for Spark 3.2.x it's breaking back
>> compatibility for all previous versions of Apache Spark. In my opinion we
>> should at least have some flag like "spark.sql.legacy.keepSqlRecursive"
>> true/false.
>>
>

-- 

--
Daniel Mantovani


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
_shrug_ I think this is a bug fix, unless I am missing something here. You
shouldn't just use __TABLE__ for everything, and I'm not seeing a good
reason to do that other than it's what you do now.
I'm not clear if it's coming across that this _can't_ work in the general
case.

On Mon, Dec 13, 2021 at 11:03 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

>
> In this context, I don't want to worry about the name of the temporary
> table. That's why it is "__TABLE__".
> The point is that this behavior for Spark 3.2.x it's breaking back
> compatibility for all previous versions of Apache Spark. In my opinion we
> should at least have some flag like "spark.sql.legacy.keepSqlRecursive"
> true/false.
>


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
In this context, I don't want to worry about the name of the temporary
table. That's why it is "__TABLE__".
The point is that this behavior for Spark 3.2.x it's breaking back
compatibility for all previous versions of Apache Spark. In my opinion we
should at least have some flag like "spark.sql.legacy.keepSqlRecursive"
true/false.

On Mon, Dec 13, 2021 at 1:47 PM Sean Owen  wrote:

> You can replace temp views. Again: what you can't do here is define a temp
> view in terms of itself. If you are reusing the same name over and over,
> it's probably easy to do that, so you don't want to do that. You want
> different names for different temp views, or else ensure you aren't doing
> the kind of thing shown in the SO post. You get the problem right?
>
> On Mon, Dec 13, 2021 at 10:43 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> I didn't post the SO issue, I've just found the same exception I'm facing
>> for Spark 3.2. Almaren Framework has a concept of create temporary views
>> with the name "__TABLE__".
>>
>> Example, if you want to use SQL dialect to a DataFrame to join a
>> table/aggregation/apply a function whatever. Instead of you create a
>> temporary table you just use the "__TABLE__" alias. You don't really care
>> about the name of the table. You may use this "__TABLE__" approach in
>> different parts of your code.
>>
>> Why can't I create or replace temporary views in different DataFrame with
>> the same name as before ?
>>
>>
>>
>> On Mon, Dec 13, 2021 at 1:27 PM Sean Owen  wrote:
>>
>>> If the issue is what you posted in SO, I think the stack trace explains
>>> it already. You want to avoid this recursive definition, which in general
>>> can't work.
>>> I think it's simply explicitly disallowed in all cases now, but, you
>>> should not be depending on this anyway - why can't this just be avoided?
>>>
>>> On Mon, Dec 13, 2021 at 10:06 AM Daniel de Oliveira Mantovani <
>>> daniel.oliveira.mantov...@gmail.com> wrote:
>>>
 Sean,

 https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2

 Just executing "sbt test" will reproduce the error, the same code works
 for spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ?

 Thank you so much



 On Mon, Dec 13, 2021 at 12:59 PM Sean Owen  wrote:

> ... but the error is not "because that already exists". See your stack
> trace. It's because the definition is recursive. You define temp view
> test1, create a second DF from it, and then redefine test1 as that result.
> test1 depends on test1.
>
> On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Sean,
>>
>> The method name is very clear "createOrReplaceTempView"  doesn't make
>> any sense to throw an exception because this view already exists. Spark
>> 3.2.x is breaking back compatibility with no reason or sense.
>>
>>
>> On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:
>>
>>> The error looks 'valid' - you define a temp view in terms of its own
>>> previous version, which doesn't quite make sense - somewhere the new
>>> definition depends on the old definition. I think it just correctly
>>> surfaces as an error now,.
>>>
>>> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
>>> daniel.oliveira.mantov...@gmail.com> wrote:
>>>
 Hello team,

 I've found this issue while I was porting my project from Apache
 Spark 3.1.x to 3.2.x.


 https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi

 Do we have a bug for that in apache-spark or I need to create one ?

 Thank you so much

 [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
 [info]   org.apache.spark.sql.AnalysisException: Recursive view
 `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
 [info]   at
 org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
 [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
 [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
 [info]   at
 scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
 [info]   at
 scala.collection.IterableLike.foreach(IterableLike.scala:74)
 [info] 

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
You can replace temp views. Again: what you can't do here is define a temp
view in terms of itself. If you are reusing the same name over and over,
it's probably easy to do that, so you don't want to do that. You want
different names for different temp views, or else ensure you aren't doing
the kind of thing shown in the SO post. You get the problem right?

On Mon, Dec 13, 2021 at 10:43 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> I didn't post the SO issue, I've just found the same exception I'm facing
> for Spark 3.2. Almaren Framework has a concept of create temporary views
> with the name "__TABLE__".
>
> Example, if you want to use SQL dialect to a DataFrame to join a
> table/aggregation/apply a function whatever. Instead of you create a
> temporary table you just use the "__TABLE__" alias. You don't really care
> about the name of the table. You may use this "__TABLE__" approach in
> different parts of your code.
>
> Why can't I create or replace temporary views in different DataFrame with
> the same name as before ?
>
>
>
> On Mon, Dec 13, 2021 at 1:27 PM Sean Owen  wrote:
>
>> If the issue is what you posted in SO, I think the stack trace explains
>> it already. You want to avoid this recursive definition, which in general
>> can't work.
>> I think it's simply explicitly disallowed in all cases now, but, you
>> should not be depending on this anyway - why can't this just be avoided?
>>
>> On Mon, Dec 13, 2021 at 10:06 AM Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>> Sean,
>>>
>>> https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2
>>>
>>> Just executing "sbt test" will reproduce the error, the same code works
>>> for spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ?
>>>
>>> Thank you so much
>>>
>>>
>>>
>>> On Mon, Dec 13, 2021 at 12:59 PM Sean Owen  wrote:
>>>
 ... but the error is not "because that already exists". See your stack
 trace. It's because the definition is recursive. You define temp view
 test1, create a second DF from it, and then redefine test1 as that result.
 test1 depends on test1.

 On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
 daniel.oliveira.mantov...@gmail.com> wrote:

> Sean,
>
> The method name is very clear "createOrReplaceTempView"  doesn't make
> any sense to throw an exception because this view already exists. Spark
> 3.2.x is breaking back compatibility with no reason or sense.
>
>
> On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:
>
>> The error looks 'valid' - you define a temp view in terms of its own
>> previous version, which doesn't quite make sense - somewhere the new
>> definition depends on the old definition. I think it just correctly
>> surfaces as an error now,.
>>
>> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> I've found this issue while I was porting my project from Apache
>>> Spark 3.1.x to 3.2.x.
>>>
>>>
>>> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>>>
>>> Do we have a bug for that in apache-spark or I need to create one ?
>>>
>>> Thank you so much
>>>
>>> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
>>> [info]   org.apache.spark.sql.AnalysisException: Recursive view
>>> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
>>> [info]   at
>>> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>>> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
>>> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>>> [info]   at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>>> [info]   at
>>> scala.collection.IterableLike.foreach(IterableLike.scala:74)
>>> [info]   at
>>> scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>>> [info]   at
>>> scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>>
>>> --
>>>
>>> --
>>> Daniel Mantovani
>>>
>>>
>
> --
>
> --
> Daniel Mantovani
>
>
>>>
>>> --
>>>
>>> --
>>> Daniel Mantovani
>>>
>>>
>
> --
>
> --
> Daniel Mantovani
>
>


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
I didn't post the SO issue, I've just found the same exception I'm facing
for Spark 3.2. Almaren Framework has a concept of create temporary views
with the name "__TABLE__".

Example, if you want to use SQL dialect to a DataFrame to join a
table/aggregation/apply a function whatever. Instead of you create a
temporary table you just use the "__TABLE__" alias. You don't really care
about the name of the table. You may use this "__TABLE__" approach in
different parts of your code.

Why can't I create or replace temporary views in different DataFrame with
the same name as before ?



On Mon, Dec 13, 2021 at 1:27 PM Sean Owen  wrote:

> If the issue is what you posted in SO, I think the stack trace explains it
> already. You want to avoid this recursive definition, which in general
> can't work.
> I think it's simply explicitly disallowed in all cases now, but, you
> should not be depending on this anyway - why can't this just be avoided?
>
> On Mon, Dec 13, 2021 at 10:06 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Sean,
>>
>> https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2
>>
>> Just executing "sbt test" will reproduce the error, the same code works
>> for spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ?
>>
>> Thank you so much
>>
>>
>>
>> On Mon, Dec 13, 2021 at 12:59 PM Sean Owen  wrote:
>>
>>> ... but the error is not "because that already exists". See your stack
>>> trace. It's because the definition is recursive. You define temp view
>>> test1, create a second DF from it, and then redefine test1 as that result.
>>> test1 depends on test1.
>>>
>>> On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
>>> daniel.oliveira.mantov...@gmail.com> wrote:
>>>
 Sean,

 The method name is very clear "createOrReplaceTempView"  doesn't make
 any sense to throw an exception because this view already exists. Spark
 3.2.x is breaking back compatibility with no reason or sense.


 On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:

> The error looks 'valid' - you define a temp view in terms of its own
> previous version, which doesn't quite make sense - somewhere the new
> definition depends on the old definition. I think it just correctly
> surfaces as an error now,.
>
> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Hello team,
>>
>> I've found this issue while I was porting my project from Apache
>> Spark 3.1.x to 3.2.x.
>>
>>
>> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>>
>> Do we have a bug for that in apache-spark or I need to create one ?
>>
>> Thank you so much
>>
>> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
>> [info]   org.apache.spark.sql.AnalysisException: Recursive view
>> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
>> [info]   at
>> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
>> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>> [info]   at
>> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>> [info]   at
>> scala.collection.IterableLike.foreach(IterableLike.scala:74)
>> [info]   at
>> scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>> [info]   at
>> scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

 --

 --
 Daniel Mantovani


>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

-- 

--
Daniel Mantovani


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
If the issue is what you posted in SO, I think the stack trace explains it
already. You want to avoid this recursive definition, which in general
can't work.
I think it's simply explicitly disallowed in all cases now, but, you should
not be depending on this anyway - why can't this just be avoided?

On Mon, Dec 13, 2021 at 10:06 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> Sean,
>
> https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2
>
> Just executing "sbt test" will reproduce the error, the same code works
> for spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ?
>
> Thank you so much
>
>
>
> On Mon, Dec 13, 2021 at 12:59 PM Sean Owen  wrote:
>
>> ... but the error is not "because that already exists". See your stack
>> trace. It's because the definition is recursive. You define temp view
>> test1, create a second DF from it, and then redefine test1 as that result.
>> test1 depends on test1.
>>
>> On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>> Sean,
>>>
>>> The method name is very clear "createOrReplaceTempView"  doesn't make
>>> any sense to throw an exception because this view already exists. Spark
>>> 3.2.x is breaking back compatibility with no reason or sense.
>>>
>>>
>>> On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:
>>>
 The error looks 'valid' - you define a temp view in terms of its own
 previous version, which doesn't quite make sense - somewhere the new
 definition depends on the old definition. I think it just correctly
 surfaces as an error now,.

 On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
 daniel.oliveira.mantov...@gmail.com> wrote:

> Hello team,
>
> I've found this issue while I was porting my project from Apache Spark
> 3.1.x to 3.2.x.
>
>
> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>
> Do we have a bug for that in apache-spark or I need to create one ?
>
> Thank you so much
>
> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
> [info]   org.apache.spark.sql.AnalysisException: Recursive view
> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
> [info]   at
> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
> [info]   at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> [info]   at
> scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at
> scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at
> scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>
> --
>
> --
> Daniel Mantovani
>
>
>>>
>>> --
>>>
>>> --
>>> Daniel Mantovani
>>>
>>>
>
> --
>
> --
> Daniel Mantovani
>
>


Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
This has come up several times over years - search JIRA. The very short
summary is: Spark does not use log4j 1.x, but its dependencies do, and
that's the issue.
Anyone that can successfully complete the surgery at this point is welcome
to, but I failed ~2 years ago.

On Mon, Dec 13, 2021 at 10:02 AM Jörn Franke  wrote:

> Is it in any case appropriate to use log4j 1.x which is not maintained
> anymore and has other security vulnerabilities which won’t be fixed anymore
> ?
>
> Am 13.12.2021 um 06:06 schrieb Sean Owen :
>
> 
> Check the CVE - the log4j vulnerability appears to affect log4j 2, not
> 1.x. There was mention that it could affect 1.x when used with JNDI or SMS
> handlers, but Spark does neither. (unless anyone can think of something I'm
> missing, but never heard or seen that come up at all in 7 years in Spark)
>
> The big issue would be applications that themselves configure log4j 2.x,
> but that's not a Spark issue per se.
>
> On Sun, Dec 12, 2021 at 10:46 PM Pralabh Kumar 
> wrote:
>
>> Hi developers,  users
>>
>> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on
>> recent CVE detected ?
>>
>>
>> Regards
>> Pralabh kumar
>>
>


Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Martin Wunderlich
There is a discussion on Github on this topic and the recommendation is 
to upgrade from 1.x to 2.15.0, due to the vulnerability of 1.x: 
https://github.com/apache/logging-log4j2/pull/608


This discussion is also referenced by the German Federal Office for 
Information Security: https://www.bsi.bund.de/EN/Home/home_node.html


Cheers,

Martin

Am 13.12.21 um 17:02 schrieb Jörn Franke:
Is it in any case appropriate to use log4j 1.x which is not maintained 
anymore and has other security vulnerabilities which won’t be fixed 
anymore ?



Am 13.12.2021 um 06:06 schrieb Sean Owen :


Check the CVE - the log4j vulnerability appears to affect log4j 2, 
not 1.x. There was mention that it could affect 1.x when used with 
JNDI or SMS handlers, but Spark does neither. (unless anyone can 
think of something I'm missing, but never heard or seen that come up 
at all in 7 years in Spark)


The big issue would be applications that themselves configure log4j 
2.x, but that's not a Spark issue per se.


On Sun, Dec 12, 2021 at 10:46 PM Pralabh Kumar 
 wrote:


Hi developers,  users

Spark is built using log4j 1.2.17 . Is there a plan to upgrade
based on recent CVE detected ?


Regards
Pralabh kumar


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Sean,

https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2

Just executing "sbt test" will reproduce the error, the same code works for
spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ?

Thank you so much



On Mon, Dec 13, 2021 at 12:59 PM Sean Owen  wrote:

> ... but the error is not "because that already exists". See your stack
> trace. It's because the definition is recursive. You define temp view
> test1, create a second DF from it, and then redefine test1 as that result.
> test1 depends on test1.
>
> On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Sean,
>>
>> The method name is very clear "createOrReplaceTempView"  doesn't make any
>> sense to throw an exception because this view already exists. Spark 3.2.x
>> is breaking back compatibility with no reason or sense.
>>
>>
>> On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:
>>
>>> The error looks 'valid' - you define a temp view in terms of its own
>>> previous version, which doesn't quite make sense - somewhere the new
>>> definition depends on the old definition. I think it just correctly
>>> surfaces as an error now,.
>>>
>>> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
>>> daniel.oliveira.mantov...@gmail.com> wrote:
>>>
 Hello team,

 I've found this issue while I was porting my project from Apache Spark
 3.1.x to 3.2.x.


 https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi

 Do we have a bug for that in apache-spark or I need to create one ?

 Thank you so much

 [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
 [info]   org.apache.spark.sql.AnalysisException: Recursive view
 `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
 [info]   at
 org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
 [info]   at
 org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
 [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
 [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
 [info]   at
 scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
 [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
 [info]   at
 scala.collection.IterableLike.foreach$(IterableLike.scala:73)
 [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)

 --

 --
 Daniel Mantovani


>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

-- 

--
Daniel Mantovani


Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Jörn Franke
Is it in any case appropriate to use log4j 1.x which is not maintained anymore 
and has other security vulnerabilities which won’t be fixed anymore ?

> Am 13.12.2021 um 06:06 schrieb Sean Owen :
> 
> 
> Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. 
> There was mention that it could affect 1.x when used with JNDI or SMS 
> handlers, but Spark does neither. (unless anyone can think of something I'm 
> missing, but never heard or seen that come up at all in 7 years in Spark)
> 
> The big issue would be applications that themselves configure log4j 2.x, but 
> that's not a Spark issue per se.
> 
>> On Sun, Dec 12, 2021 at 10:46 PM Pralabh Kumar  
>> wrote:
>> Hi developers,  users 
>> 
>> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on 
>> recent CVE detected ?
>> 
>> 
>> Regards
>> Pralabh kumar


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
... but the error is not "because that already exists". See your stack
trace. It's because the definition is recursive. You define temp view
test1, create a second DF from it, and then redefine test1 as that result.
test1 depends on test1.

On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> Sean,
>
> The method name is very clear "createOrReplaceTempView"  doesn't make any
> sense to throw an exception because this view already exists. Spark 3.2.x
> is breaking back compatibility with no reason or sense.
>
>
> On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:
>
>> The error looks 'valid' - you define a temp view in terms of its own
>> previous version, which doesn't quite make sense - somewhere the new
>> definition depends on the old definition. I think it just correctly
>> surfaces as an error now,.
>>
>> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
>> daniel.oliveira.mantov...@gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> I've found this issue while I was porting my project from Apache Spark
>>> 3.1.x to 3.2.x.
>>>
>>>
>>> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>>>
>>> Do we have a bug for that in apache-spark or I need to create one ?
>>>
>>> Thank you so much
>>>
>>> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
>>> [info]   org.apache.spark.sql.AnalysisException: Recursive view
>>> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
>>> [info]   at
>>> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>>> [info]   at
>>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>>> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
>>> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>>> [info]   at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>>> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>>> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>>> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>>
>>> --
>>>
>>> --
>>> Daniel Mantovani
>>>
>>>
>
> --
>
> --
> Daniel Mantovani
>
>


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Sean,

The method name is very clear "createOrReplaceTempView"  doesn't make any
sense to throw an exception because this view already exists. Spark 3.2.x
is breaking back compatibility with no reason or sense.


On Mon, Dec 13, 2021 at 12:53 PM Sean Owen  wrote:

> The error looks 'valid' - you define a temp view in terms of its own
> previous version, which doesn't quite make sense - somewhere the new
> definition depends on the old definition. I think it just correctly
> surfaces as an error now,.
>
> On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Hello team,
>>
>> I've found this issue while I was porting my project from Apache Spark
>> 3.1.x to 3.2.x.
>>
>>
>> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>>
>> Do we have a bug for that in apache-spark or I need to create one ?
>>
>> Thank you so much
>>
>> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
>> [info]   org.apache.spark.sql.AnalysisException: Recursive view
>> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
>> [info]   at
>> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
>> [info]   at
>> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
>> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
>> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

-- 

--
Daniel Mantovani


Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
The error looks 'valid' - you define a temp view in terms of its own
previous version, which doesn't quite make sense - somewhere the new
definition depends on the old definition. I think it just correctly
surfaces as an error now,.

On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <
daniel.oliveira.mantov...@gmail.com> wrote:

> Hello team,
>
> I've found this issue while I was porting my project from Apache Spark
> 3.1.x to 3.2.x.
>
>
> https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi
>
> Do we have a bug for that in apache-spark or I need to create one ?
>
> Thank you so much
>
> [info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
> [info]   org.apache.spark.sql.AnalysisException: Recursive view
> `__TABLE__` detected (cycle: `__TABLE__` -> `__TABLE__`)
> [info]   at
> org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
> [info]   at
> org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> [info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> [info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> [info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>
> --
>
> --
> Daniel Mantovani
>
>


spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Hello team,

I've found this issue while I was porting my project from Apache Spark
3.1.x to 3.2.x.

https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi

Do we have a bug for that in apache-spark or I need to create one ?

Thank you so much

[info] com.github.music.of.the.ainur.almaren.Test *** ABORTED ***
[info]   org.apache.spark.sql.AnalysisException: Recursive view `__TABLE__`
detected (cycle: `__TABLE__` -> `__TABLE__`)
[info]   at
org.apache.spark.sql.errors.QueryCompilationErrors$.recursiveViewDetectedError(QueryCompilationErrors.scala:2045)
[info]   at
org.apache.spark.sql.execution.command.ViewHelper$.checkCyclicViewReference(views.scala:515)
[info]   at
org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2(views.scala:522)
[info]   at
org.apache.spark.sql.execution.command.ViewHelper$.$anonfun$checkCyclicViewReference$2$adapted(views.scala:522)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:941)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:941)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
[info]   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
[info]   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
[info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)

-- 

--
Daniel Mantovani


Re: About some Spark technical assistance

2021-12-13 Thread sam smith
you were added to the repo to contribute, thanks. I included the java class
and the paper i am replicating

Le lun. 13 déc. 2021 à 04:27,  a écrit :

> github url please.
>
> On 2021-12-13 01:06, sam smith wrote:
> > Hello guys,
> >
> > I am replicating a paper's algorithm (graph coloring algorithm) in
> > Spark under Java, and thought about asking you guys for some
> > assistance to validate / review my 600 lines of code. Any volunteers
> > to share the code with ?
> > Thanks
>