brushworth opened a new pull request #317:
URL: https://github.com/apache/rya/pull/317


   …consistently to a well-defined standard. This commit begins to tidy the 
Accumulo config (MongoDB to come) but more work is required.
   
   <!--
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
   
     http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
   -->
   ## Description
   >What Changed?
   
   This pull request is a DRAFT seeking feedback from the community.
   
   The current structure and documentation concerning the 
environment.properties and spring xml configuration seems to be lacking. Trying 
to get advanced features working in Tomcat, for example the 
AccumuloSelectivityEvalDAO or the various indexing strategies, is a very hard 
slog, particularly for new comers to the project.
   
   I've started to tidy up the accumulo and extension spring xml files. I've 
tested the accumulo and extensions files on a test cluster. I'm not 100% sure 
the AccumuloSelectivityEvalDAO is working fully, but it seems to be running. 
I've got a different branch of Rya that contains a bunch of debug logging that 
I will try tomorrow.
   
   I propose we add default properties files to the project that work out of 
the box with Fluo Muchos (or similar) to allow for easy spin up of a 
development cluster to give Rya a whirl, for example on AWS or Azure. People 
can then easily edit them to their environment, rather than having to reverse 
engineer what parameters are available and what default values are in use.
   
   I'm happy to do MongoDB configuration too but I don't have a test cluster 
running at present.
   
   I'm after feedback from more experienced developers of Rya about whether 
these changes are heading in the correct direction. For example, I've replaced 
some of the configuration xml calling setter methods like this:
   
   ```
       <bean id="conf" class="org.apache.rya.accumulo.AccumuloRdfConfiguration">
           <!-- Calls setter method name -->
           <property name="tablePrefix" value="${rya.tableprefix}"/>
           <property name="displayQueryPlan" value="${rya.displayqueryplan}"/>
           <property name="useStats" value="false"/>
           <property name="useStats" value="${rya.usestats}"/>
           <property name="useSelectivity" value="${rya.useselectivity}"/>
           <property name="useStatementMetadata" 
value="${rya.usestatementmetadata}"/>
           <property name="numThreads" value="${rya.querythreads}"/>
           <property name="batchSize" value="${rya.batchsize}"/>
           <property name="dataWaveEdge" value="${rya.datawaveedge}"/>
           <property name="dataType" value="org.eclipse.rdf4j.model.Statement"/>
           <!--
           <property name="useEntity" value="${sc.use_entity}"/>
           <property name="useGeo" value="${sc.use_geo}"/>
           <property name="useFreeText" value="${sc.use_freetext}"/>
           <property name="useTemporal" value="${sc.use_temporal}"/>
           -->
       </bean>
   ```
    with 
   ```
       <bean id="conf" class="org.apache.rya.accumulo.AccumuloRdfConfiguration" 
factory-method="fromProperties">
           <constructor-arg ref="properties"/>
       </bean>
   ```
   
   Additionally, there is a lot of duplication in the properties space. I'm 
trying understand why. I'm also trying to understand the differences.
   
   For example, there are environment properties `accumulo.instance`, 
`instance.name` and `sc.cloudbase.instancename` in different places, all 
obviously referring to the same type of thing? Is this redundancy deliberate, 
or can I start consolidating it down? Can we get down to a single Rya 
configuration properties file for an entire Rya installation (e.g. ingest jobs, 
Tomcat, Fluo, etc)?
   
   I'm still a little lost in the details here, and any background or advice 
would be much appreciated.
   
   ### Tests
   >Coverage?
   
   N/A
   
   ### Links
   [Jira RYA-70](https://issues.apache.org/jira/browse/RYA-70)
   
   ### Checklist
   - [ ] Code Review
   - [ ] Squash Commits
   
   #### People To Reivew
   [Add those who should review this]
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to