Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-10 Thread vaquar khan
Avoid groupby and use reducebykey.

Regards,
Vaquar khan

On Jun 4, 2017 8:32 AM, "Guy Cohen"  wrote:

> Try this one:
>
> df.groupBy(
>   when(expr("field1='foo'"),"field1").when(expr("field2='bar'"),"field2"))
>
>
> On Sun, Jun 4, 2017 at 3:16 AM, Bryan Jeffrey 
> wrote:
>
>> You should be able to project a new column that is your group column.
>> Then you can group on the projected column.
>>
>> Get Outlook for Android 
>>
>>
>>
>>
>> On Sat, Jun 3, 2017 at 6:26 PM -0400, "upendra 1991" <
>> upendra1...@yahoo.com.invalid> wrote:
>>
>> Use a function
>>>
>>> Sent from Yahoo Mail on Android
>>> 
>>>
>>> On Sat, Jun 3, 2017 at 5:01 PM, kant kodali
>>>  wrote:
>>> Hi All,
>>>
>>> Is there a way to do conditional group by in spark 2.1.1? other words, I
>>> want to do something like this
>>>
>>> if (field1 == "foo") {
>>>df.groupBy(field1)
>>> } else if (field2 == "bar")
>>>   df.groupBy(field2)
>>>
>>> Thanks
>>>
>>>
>


Re: Spark Job is stuck at SUBMITTED when set Driver Memory > Executor Memory

2017-06-10 Thread vaquar khan
You can add memory in your command make sure given memory available on your
executor

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000


https://spark.apache.org/docs/1.1.0/submitting-applications.html

Also try to avoid function need memory like collect etc.


Regards,
Vaquar khan


On Jun 4, 2017 5:46 AM, "Abdulfattah Safa"  wrote:

I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I got OOM in t he driver thread. If found that when
setting  the Driver Memory to > Executor Memory, the submitted job is stuck
at Submitted in the driver and the application never starts.


Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-10 Thread vaquar khan
Hi ,
Pleaae check your firewall security setting sharing link one good link.

http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html?m=1



Regards,
Vaquar khan

On Jun 8, 2017 1:53 AM, "Patrik Medvedev"  wrote:

> Hello guys,
>
> Can somebody help me with my problem?
> Let me know, if you need more details.
>
>
> ср, 7 июн. 2017 г. в 16:43, Patrik Medvedev :
>
>> No, I don't.
>>
>> ср, 7 июн. 2017 г. в 16:42, Jean Georges Perrin :
>>
>>> Do you have some other security in place like Kerberos or impersonation?
>>> It may affect your access.
>>>
>>>
>>> jg
>>>
>>>
>>> On Jun 7, 2017, at 02:15, Patrik Medvedev 
>>> wrote:
>>>
>>> Hello guys,
>>>
>>> I need to execute hive queries on remote hive server from spark, but for
>>> some reasons i receive only column names(without data).
>>> Data available in table, i checked it via HUE and java jdbc connection.
>>>
>>> Here is my code example:
>>> val test = spark.read
>>> .option("url", "jdbc:hive2://remote.hive.
>>> server:1/work_base")
>>> .option("user", "user")
>>> .option("password", "password")
>>> .option("dbtable", "some_table_with_data")
>>> .option("driver", "org.apache.hive.jdbc.HiveDriver")
>>> .format("jdbc")
>>> .load()
>>> test.show()
>>>
>>>
>>> Scala version: 2.11
>>> Spark version: 2.1.0, i also tried 2.1.1
>>> Hive version: CDH 5.7 Hive 1.1.1
>>> Hive JDBC version: 1.1.1
>>>
>>> But this problem available on Hive with later versions, too.
>>> Could you help me with this issue, because i didn't find anything in
>>> mail group answers and StackOverflow.
>>> Or could you help me find correct solution how to query remote hive from
>>> spark?
>>>
>>> --
>>> *Cheers,*
>>> *Patrick*
>>>
>>>


Re: Scala, Python or Java for Spark programming

2017-06-10 Thread vaquar khan
It's depends on programming style ,I would like to say setup few rules to
avoid complex code in scala , if needed ask programmer to add proper
comments.


Regards,
Vaquar khan

On Jun 8, 2017 4:17 AM, "JB Data"  wrote:

> Java is Object langage borned to Data, Python is Data langage borned to
> Objects or else... Eachone has its owns uses.
>
>
>
> @JBD 
>
>
> 2017-06-08 8:44 GMT+02:00 Jörn Franke :
>
>> A slight advantage of Java is also the tooling that exist around it -
>> better support by build tools and plugins, advanced static code analysis
>> (security, bugs, performance) etc.
>>
>> On 8. Jun 2017, at 08:20, Mich Talebzadeh 
>> wrote:
>>
>> What I like about Scala is that it is less ceremonial compared to Java.
>> Java users claim that Scala is built on Java so the error tracking is very
>> difficult. Also Scala sits on top of Java and that makes it virtually
>> depending on Java.
>>
>> For me the advantage of Scala is its simplicity and compactness. I can
>> write a Spark streaming code in Sala pretty fast or import massive RDBMS
>> table into Hive and table of my design equally very fast using Scala.
>>
>> I don't know may be I cannot be bothered writing 100 lines of Java for a
>> simple query from a table :)
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 8 June 2017 at 00:11, Matt Tenenbaum 
>> wrote:
>>
>>> A lot depends on your context as well. If I'm using Spark _for
>>> analysis_, I frequently use python; it's a starting point, from which I can
>>> then leverage pandas, matplotlib/seaborn, and other powerful tools
>>> available on top of python.
>>>
>>> If the Spark outputs are the ends themselves, rather than the means to
>>> further exploration, Scala still feels like the "first class"
>>> language---most thorough feature set, best debugging support, etc.
>>>
>>> More crudely: if the eventual goal is a dataset, I tend to prefer Scala;
>>> if it's a visualization or some summary values, I tend to prefer Python.
>>>
>>> Of course, I also agree that this is more theological than technical.
>>> Appropriately size your grains of salt.
>>>
>>> Cheers
>>> -mt
>>>
>>> On Wed, Jun 7, 2017 at 12:39 PM, Bryan Jeffrey 
>>> wrote:
>>>
 Mich,

 We use Scala for a large project.  On our team we've set a few
 standards to ensure readability (we try to avoid excessive use of tuples,
 use named functions, etc.)  Given these constraints, I find Scala to be
 very readable, and far easier to use than Java.  The Lambda functionality
 of Java provides a lot of similar features, but the amount of typing
 required to set down a small function is excessive at best!

 Regards,

 Bryan Jeffrey

 On Wed, Jun 7, 2017 at 12:51 PM, Jörn Franke 
 wrote:

> I think this is a religious question ;-)
> Java is often underestimated, because people are not aware of its
> lambda functionality which makes the code very readable. Scala - it 
> depends
> who programs it. People coming with the normal Java background write
> Java-like code in scala which might not be so good. People from a
> functional background write it more functional like - i.e. You have a lot
> of things in one line of code which can be a curse even for other
> functional programmers, especially if the application is distributed as in
> the case of Spark. Usually no comment is provided and you have - even as a
> functional programmer - to do a lot of drill down. Python is somehow
> similar, but since it has no connection with Java you do not have these
> extremes. There it depends more on the community (e.g. Medical, 
> financials)
> and skills of people how the code look likes.
> However the difficulty comes with the distributed applications behind
> Spark which may have unforeseen side effects if the users do not know 
> this,
> ie if they have never been used to parallel programming.
>
> On 7. Jun 2017, at 17:20, Mich Talebzadeh 
> wrote:
>
>
> Hi,
>
> I am a fan of Scala and functional programming hence I prefer Scala.
>
> I had a discussion with a hardcore Java programmer and a data
> scientist who prefers 

Re: Read Data From NFS

2017-06-10 Thread vaquar khan
Hi Ayan,

If you have multiple files (example 12 files )and you are using following
code then you will get 12 partition.

r = sc.textFile("file://my/file/*")

Not sure what you want to know about file system ,please check API doc.


Regards,
Vaquar khan

On Jun 8, 2017 10:44 AM, "ayan guha"  wrote:

Any one?

On Thu, 8 Jun 2017 at 3:26 pm, ayan guha  wrote:

> Hi Guys
>
> Quick one: How spark deals (ie create partitions) with large files sitting
> on NFS, assuming the all executors can see the file exactly same way.
>
> ie, when I run
>
> r = sc.textFile("file://my/file")
>
> what happens if the file is on NFS?
>
> is there any difference from
>
> r = sc.textFile("hdfs://my/file")
>
> Are the input formats used same in both cases?
>
>
> --
> Best Regards,
> Ayan Guha
>
-- 
Best Regards,
Ayan Guha


Re: problem initiating spark context with pyspark

2017-06-10 Thread Felix Cheung
Curtis, assuming you are running a somewhat recent windows version you would 
not have access to c:\tmp, in your command example

winutils.exe ls -F C:\tmp\hive

Try changing the path to under your user directory.

Running Spark on Windows should work :)


From: Curtis Burkhalter 
Sent: Wednesday, June 7, 2017 7:46:56 AM
To: Doc Dwarf
Cc: user@spark.apache.org
Subject: Re: problem initiating spark context with pyspark

Thanks Doc I saw this on another board yesterday so I've tried this by first 
going to the directory where I've stored the wintutils.exe and then as an admin 
running the command  that you suggested and I get this exception when checking 
the permissions:

C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
FindFileOwnerAndPermission error (1789): The trust relationship between this 
workstation and the primary domain failed.

I'm fairly new to the command line and determining what the different 
exceptions mean. Do you have any advice what this error means and how I might 
go about fixing this?

Thanks again


On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf 
> wrote:
Hi Curtis,

I believe in windows, the following command needs to be executed: (will need 
winutils installed)

D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive



On 6 June 2017 at 09:45, Curtis Burkhalter 
> wrote:
Hello all,

I'm new to Spark and I'm trying to interact with it using Pyspark. I'm using 
the prebuilt version of spark v. 2.1.1 and when I go to the command line and 
use the command 'bin\pyspark' I have initialization problems and get the 
following message:

C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC 
v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/06/06 10:30:21 WARN ObjectStore: Version information not found in metastore. 
hive.metastore.schema.verification is not enabled so recording the schema 
version 1.2.0
17/06/06 10:30:21 WARN ObjectStore: Failed to get database default, returning 
NoSuchObjectException
Traceback (most recent call last):
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 
63, in deco
return f(*a, **kw)
  File 
"C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
 line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o22.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveSessionState':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at 
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 13 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveExternalCatalog':
at 
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at 

Re: problem initiating spark context with pyspark

2017-06-10 Thread Marco Mistroni
Ha...it's a 1 off.I run spk on Ubuntu and docker on windows...I
don't think spark and windows are best friends.  

On Jun 10, 2017 6:36 PM, "Gourav Sengupta" 
wrote:

> seeing for the very first time someone try SPARK on Windows :)
>
> On Thu, Jun 8, 2017 at 8:38 PM, Marco Mistroni 
> wrote:
>
>> try this link
>>
>> http://letstalkspark.blogspot.co.uk/2016/02/getting-started-
>> with-spark-on-window-64.html
>>
>> it helped me when i had similar problems with windows...
>>
>> hth
>>
>> On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter <
>> curtisburkhal...@gmail.com> wrote:
>>
>>> Thanks Doc I saw this on another board yesterday so I've tried this by
>>> first going to the directory where I've stored the wintutils.exe and then
>>> as an admin running the command  that you suggested and I get this
>>> exception when checking the permissions:
>>>
>>> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
>>> FindFileOwnerAndPermission error (1789): The trust relationship between
>>> this workstation and the primary domain failed.
>>>
>>> I'm fairly new to the command line and determining what the different
>>> exceptions mean. Do you have any advice what this error means and how I
>>> might go about fixing this?
>>>
>>> Thanks again
>>>
>>>
>>> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf  wrote:
>>>
 Hi Curtis,

 I believe in windows, the following command needs to be executed: (will
 need winutils installed)

 D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive



 On 6 June 2017 at 09:45, Curtis Burkhalter 
 wrote:

> Hello all,
>
> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
> using the prebuilt version of spark v. 2.1.1 and when I go to the command
> line and use the command 'bin\pyspark' I have initialization problems and
> get the following message:
>
> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016,
> 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile: org/apache/spark/log4j-default
> s.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so
> recording the schema version 1.2.0
> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
> returning NoSuchObjectException
> Traceback (most recent call last):
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
> line 63, in deco
> return f(*a, **kw)
>   File 
> "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
> line 319, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o22.sessionState.
> : java.lang.IllegalArgumentException: Error while instantiating
> 'org.apache.spark.sql.hive.HiveSessionState':
> at org.apache.spark.sql.SparkSess
> ion$.org$apache$spark$sql$SparkSession$$reflect(SparkSession
> .scala:981)
> at org.apache.spark.sql.SparkSess
> ion.sessionState$lzycompute(SparkSession.scala:110)
> at org.apache.spark.sql.SparkSess
> ion.sessionState(SparkSession.scala:109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccess
> orImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAc
> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.
> invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngi
> ne.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.
> invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorA
> ccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorA
> ccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstruc

Re: problem initiating spark context with pyspark

2017-06-10 Thread Gourav Sengupta
seeing for the very first time someone try SPARK on Windows :)

On Thu, Jun 8, 2017 at 8:38 PM, Marco Mistroni  wrote:

> try this link
>
> http://letstalkspark.blogspot.co.uk/2016/02/getting-started-
> with-spark-on-window-64.html
>
> it helped me when i had similar problems with windows...
>
> hth
>
> On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter <
> curtisburkhal...@gmail.com> wrote:
>
>> Thanks Doc I saw this on another board yesterday so I've tried this by
>> first going to the directory where I've stored the wintutils.exe and then
>> as an admin running the command  that you suggested and I get this
>> exception when checking the permissions:
>>
>> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
>> FindFileOwnerAndPermission error (1789): The trust relationship between
>> this workstation and the primary domain failed.
>>
>> I'm fairly new to the command line and determining what the different
>> exceptions mean. Do you have any advice what this error means and how I
>> might go about fixing this?
>>
>> Thanks again
>>
>>
>> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf  wrote:
>>
>>> Hi Curtis,
>>>
>>> I believe in windows, the following command needs to be executed: (will
>>> need winutils installed)
>>>
>>> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>>>
>>>
>>>
>>> On 6 June 2017 at 09:45, Curtis Burkhalter 
>>> wrote:
>>>
 Hello all,

 I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
 using the prebuilt version of spark v. 2.1.1 and when I go to the command
 line and use the command 'bin\pyspark' I have initialization problems and
 get the following message:

 C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
 Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
 [MSC v.1900 64 bit (AMD64)] on win32
 Type "help", "copyright", "credits" or "license" for more information.
 Using Spark's default log4j profile: org/apache/spark/log4j-default
 s.properties
 Setting default log level to "WARN".
 To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
 setLogLevel(newLevel).
 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
 metastore. hive.metastore.schema.verification is not enabled so
 recording the schema version 1.2.0
 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
 returning NoSuchObjectException
 Traceback (most recent call last):
   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
 line 63, in deco
 return f(*a, **kw)
   File 
 "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
 line 319, in get_return_value
 py4j.protocol.Py4JJavaError: An error occurred while calling
 o22.sessionState.
 : java.lang.IllegalArgumentException: Error while instantiating
 'org.apache.spark.sql.hive.HiveSessionState':
 at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
 kSession$$reflect(SparkSession.scala:981)
 at org.apache.spark.sql.SparkSession.sessionState$lzycompute(Sp
 arkSession.scala:110)
 at org.apache.spark.sql.SparkSession.sessionState(SparkSession.
 scala:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
 ssorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
 thodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.jav
 a:357)
 at py4j.Gateway.invoke(Gateway.java:280)
 at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j
 ava:132)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:214)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
 ConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
 legatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:4
 23)
 at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
 kSession$$reflect(SparkSession.scala:978)
 ... 13 more
 Caused by: 

[jira] Lantao Jin shared "SPARK-21023: Ignore to load default properties file is not a good choice from the perspective of system" with you

2017-06-10 Thread Lantao Jin (JIRA)
Lantao Jin shared an issue with you


Hi all,
Do you think is it a bug?
Should we keep the current behavior still?

> Ignore to load default properties file is not a good choice from the 
> perspective of system
> --
>
> Key: SPARK-21023
> URL: https://issues.apache.org/jira/browse/SPARK-21023
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 2.1.1
>Reporter: Lantao Jin
>Priority: Minor
>
> The default properties file {{spark-defaults.conf}} shouldn't be ignore to 
> load even though the submit arg {{--properties-file}} is set. The reasons are 
> very easy to see:
> * Infrastructure team need continually update the {{spark-defaults.conf}} 
> when they want set something as default for entire cluster as a tuning 
> purpose.
> * Application developer only want to override the parameters they really want 
> rather than others they even doesn't know (Set by infrastructure team).
> * The purpose of using {{\-\-properties-file}} from most of application 
> developers is to avoid setting dozens of {{--conf k=v}}. But if 
> {{spark-defaults.conf}} is ignored, the behaviour becomes unexpected finally.
> For example:
> Current implement
> ||Property name||Value in default||Value in user-special||Finally value||
> |spark.A|"foo"|"bar"|"bar"|
> |spark.B|"foo"|N/A|N/A|
> |spark.C|N/A|"bar"|"bar"|
> |spark.D|"foo"|"foo"|"foo"|
> |spark.E|"foo"|N/A|N/A|
> |spark.F|"foo"|N/A|N/A|
> Expected right implement
> ||Property name||Value in default||Value in user-special||Finally value||
> |spark.A|"foo"|"bar"|"bar"|
> |spark.B|"foo"|N/A|"foo"|
> |spark.C|N/A|"bar"|"bar"|
> |spark.D|"foo"|"foo"|"foo"|
> |spark.E|"foo"|"foo"|"foo"|
> |spark.F|"foo"|"foo"|"foo"|
> I can offer a patch to fix it if you think it make sense.

 Also shared with
  d...@spark.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org