the issue is that sometimes people explicitly want to put stuff into the
spark package tree just to get at things which spark scoped as
org.apache.spark. Unless/Until the relevant APIs/classes are rescoped to be
public, putting your classes under the package hierarchy lets your own code
at it. It just confuses stack trace analysis as it's not immediately
obvious whose code is playing up.



On Tue, 22 Sep 2020 at 04:03, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

> Hi, Steve.
>
> Sure, you can suggest, but I'm wondering how the suggested namespaces are
> able to satisfy the existing visibility rules. Could you give us some
> examples specifically?
>
> > Can I suggest some common prefix for third-party-classes put into the
> spark package tree, just to make clear that they are external contributions?
>
> Bests,
> Dongjoon.
>
>
> On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <ste...@cloudera.com.invalid>
> wrote:
>
>>
>> I've just been stack-trace-chasing the 404-in-task-commit code:
>>
>> https://issues.apache.org/jira/browse/HADOOP-17216
>>
>> And although it's got an org.apache.spark. prefix, it's
>> actually org.apache.spark.sql.delta, which lives in github, so the
>> code/issue tracker lives elsewhere.
>>
>> I understand why they've done this -I've done it myself- it's to get a
>> classes package-scoped to spark (
>> https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala
>> )
>>
>> however, it can be confusing and time wasting
>>
>> Can I suggest some common prefix for third-party-classes put into the
>> spark package tree, just to make clear that they are external
>> contributions? It will set expectations up all round
>>
>> -Steve
>>
>> (*) Side node: Could whoever maintains that code do retries, which have
>> to have sleeps of >10-15s? We ended up having to do exponental backoff of >
>> 90s to make sure the load balancers were clean. The time for a 404 to clear
>> is not "time since file was added", it is "time since last HEAD/GET/COPY
>> request". thx
>>
>

Reply via email to