[ 
https://issues.apache.org/jira/browse/SPARK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325317#comment-14325317
 ] 

Nicholas Chammas commented on SPARK-925:
----------------------------------------

I would prefer a format that is more human friendly and that supports comments 
directly. To me, JSON is better for data exchange, and YAML is better for 
config files and other things that humans are going to be dealing with directly.

It's true that there are other config formats used in Spark. The ones under 
[conf/|https://github.com/apache/spark/tree/master/conf], however, are not 
JSON. Which ones were you thinking of?

As long as the config format is consistent within a sub-project, I think it's 
OK. Since spark-ec2 doesn't have any config files yet, I don't think it's bad 
to go with YAML.

{quote}
With JSON we deal with internally, we have started to nest definitions so that 
it is easy for some one to modify one small setting without having to specify 
all the other settings – and as a work around to comments.
{quote}

As discussed before, YAML supports comments directly, which IMO is essential 
for a config format. With regards to modifying a setting without specifying 
everything, I'm not sure I understand the use case.

If we define some config file resolution order (first check /first/config, then 
check /second/config, etc.), is it that bad if people just copied the default 
config from /second/config to /first/config and modified what they wanted? I 
believe that's how it generally works in tools that check multiple places for 
configuration.

A better way to do this would probably be to allow people to specify a sub-set 
of options in any given file, and option sets just get merged on top of the 
options specified in the preceding file. That seems more complexity than is 
worth it at this time, though.

> Allow ec2 scripts to load default options from a json file
> ----------------------------------------------------------
>
>                 Key: SPARK-925
>                 URL: https://issues.apache.org/jira/browse/SPARK-925
>             Project: Spark
>          Issue Type: Improvement
>          Components: EC2
>    Affects Versions: 0.8.0
>            Reporter: Shay Seng
>            Priority: Minor
>
> The option list for ec2 script can be a little irritating to type in, 
> especially things like path to identity-file, region , zone, ami etc.
> It would be nice if ec2 script looks for an options.json file in the 
> following order: (1) PWD, (2) ~/spark-ec2, (3) same dir as spark_ec2.py
> Something like:
> def get_defaults_from_options():
>   # Check to see if a options.json file exists, if so load it. 
>   # However, values in the options.json file can only overide values in opts
>   # if the Opt values are None or ""
>   # i.e. commandline options take presidence 
>   defaults = 
> {'aws-access-key-id':'','aws-secret-access-key':'','key-pair':'', 
> 'identity-file':'', 'region':'ap-southeast-1', 'zone':'', 
> 'ami':'','slaves':1, 'instance-type':'m1.large'}
>   # Look for options.json in directory cluster was called from
>   # Had to modify the spark_ec2 wrapper script since it mangles the pwd
>   startwd = os.environ['STARTWD']
>   if os.path.exists(os.path.join(startwd,"options.json")):
>       optionspath = os.path.join(startwd,"options.json")
>   else:
>       optionspath = os.path.join(os.getcwd(),"options.json")
>       
>   try:
>     print "Loading options file: ", optionspath  
>     with open (optionspath) as json_data:
>         jdata = json.load(json_data)
>         for k in jdata:
>           defaults[k]=jdata[k]
>   except IOError:
>     print 'Warning: options.json file not loaded'
>   # Check permissions on identity-file, if defined, otherwise launch will 
> fail late and will be irritating
>   if defaults['identity-file']!='':
>     st = os.stat(defaults['identity-file'])
>     user_can_read = bool(st.st_mode & stat.S_IRUSR)
>     grp_perms = bool(st.st_mode & stat.S_IRWXG)
>     others_perm = bool(st.st_mode & stat.S_IRWXO)
>     if (not user_can_read):
>       print "No read permission to read ", defaults['identify-file']
>       sys.exit(1)
>     if (grp_perms or others_perm):
>       print "Permissions are too open, please chmod 600 file ", 
> defaults['identify-file']
>       sys.exit(1)
>   # if defaults contain AWS access id or private key, set it to environment. 
>   # required for use with boto to access the AWS console 
>   if defaults['aws-access-key-id'] != '':
>     os.environ['AWS_ACCESS_KEY_ID']=defaults['aws-access-key-id'] 
>   if defaults['aws-secret-access-key'] != '':   
>     os.environ['AWS_SECRET_ACCESS_KEY'] = defaults['aws-secret-access-key']
>   return defaults  
>     



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to