Author: lewismc
Date: Tue Jun 10 16:08:08 2014
New Revision: 1601670
URL: http://svn.apache.org/r1601670
Log:
Complete gora-solr documentation
Modified:
gora/site/trunk/content/current/gora-solr.md
Modified: gora/site/trunk/content/current/gora-solr.md
URL:
http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-solr.md?rev=1601670&r1=1601669&r2=1601670&view=diff
==============================================================================
--- gora/site/trunk/content/current/gora-solr.md (original)
+++ gora/site/trunk/content/current/gora-solr.md Tue Jun 10 16:08:08 2014
@@ -1,77 +1,176 @@
Title: Gora HBase Module
##Overview
-This is the main documentation for the gora-hbase module. gora-hbase
-module enables [Apache HBase](http://hbase.apache.org) backend support for
Gora.
+This is the main documentation for the gora-solr module. gora-solr
+module enables [Apache Solr](http://lucene.apache.org/solr) backend support
for Gora.
##gora.properties
-* <code>gora.datastore.default=org.apache.gora.hbase.store.HBaseStore</code> -
Implementation of the storage class
+* <code>gora.datastore.default=org.apache.gora.solr.store.SolrStore</code> -
Implementation of the storage class
* <code>gora.datastore.autocreateschema=true</code> - Create the table if
doesn't exist
-* <code>gora.datastore.scanner.caching=1000</code> - HBase client cache that
improves the scan in HBase (default 0)
-* <code>hbase.client.autoflush.default=false</code> - HBase autoflushing.
Enabling autoflush decreases write performance. Available since Gora 0.2.
Defaults to disabled.
+* <code>gora.solrstore.solr.url=http://localhost:9876/solr</code> - The URL of
the Solr server.
+* <code>gora.solrstore.solr.config</code> - The <code>solrconfig.xml</code>
file to be used.
+* <code>gora.solrstore.solr.schema</code> - The <code>schema.xml</code> file
to be used.
+* <code>gora.solrstore.solr.batchSize</code> - A batch size unit (ArrayList)
of SolrDocument's to be used for writing to Solr. A default value of <b>100</b>
is used if this value is absent. This value must be of type <b>Integer</b>.
+* <code>gora.solrstore.solr.solrjserver</code> - The solrj implementation to
use. This has a default value of <b>http</b> for <i>HttpSolrServer</i>.
Available options include <b>http</b> (<i>HttpSolrServer</i>), <b>cloud</b>
(<i>CloudSolrServer</i>), <b>concurrent</b> (<i>ConcurrentUpdateSolrServer</i>)
and <b>loadbalance</b> (<i>LBSolrServer</i>). This value must be of type
<b>String</b>.
+* <code>gora.solrstore.solr.commitWithin</code> - A batch commit unit for
SolrDocument's used when making (commit) calls to Solr. A default value of 1000
is used if this value is absent. This value must be of type <b>Integer</b>.
+* <code>gora.solrstore.solr.resultsSize</code> - The maximum number of results
to return when we make a call to
<code>org.apache.gora.solr.store.SolrStore#execute(Query)</code>. This value
must be of type <b>Integer</b>.
-##Gora HBase mappings
-Say we wished to map some Employee data and store it into the HBaseStore.
+##Gora Solr mappings
+Say we wished to map some Employee data and store it into the SolrStore.
<gora-orm>
- <table name="Employee">
- <family name="info"
- compression="$$$"
- blockCache="$$$"
- blockSize="$$$"
- bloomFilter="$$$"
- maxVersions="$$$"
- timeToLive="$$$"
- inMemory="$$$" />
- </table>
-
- <class name="org.apache.gora.examples.generated.Employee"
keyClass="java.lang.String" table="Employee">
- <field name="name" family="info" qualifier="nm"/>
- <field name="dateOfBirth" family="info" qualifier="db"/>
- <field name="ssn" family="info" qualifier="sn"/>
- <field name="salary" family="info" qualifier="sl"/>
- <field name="boss" family="info" qualifier="bs"/>
- <field name="webpage" family="info" qualifier="wp"/>
- </class>
+ <class name="org.apache.gora.examples.generated.Employee"
keyClass="java.lang.String" table="Employee">
+ <primarykey column="ssn"/>
+ <field name="name" column="name"/>
+ <field name="dateOfBirth" column="dateOfBirth"/>
+ <field name="salary" column="salary"/>
+ <field name="boss" column="boss"/>
+ <field name="webpage" column="webpage"/>
+ </class>
</gora-orm>
-Here you can see that we require the definition of two child elements within
the
+Here you can see that we require the definition of only one child element
within the
<code>gora-orm</code> mapping configuration, namely;
-The table element; where we specify:
-
-1. a parameter relating to the HBase table name (String) e.g.
name=<b>"Employee"</b>,
-
-2. a nested element containing the type and definition of families we wish to
create within HBase. In this case we create one family <b>info</b> which could
have a combination of any of the following parameters;
-
- <b>name</b> (String): family name e.g. info
-
- <b>compression</b> (String): the compression option to use in HBase.
Please see <a href="http://hbase.apache.org/book/compression.html">HBase
documentation</a>.
-
- <b>blockCache</b> (boolean): an LRU cache that contains three levels of
block priority to allow for scan-resistance and in-memory ColumnFamilies.
Please see <a
href="https://hbase.apache.org/book/regionserver.arch.html#block.cache">HBase
documentation</a>.
-
- <b>blockSize</b> (Integer): The blocksize can be configured for each
ColumnFamily in a table, and this defaults to 64k. Larger cell values require
larger blocksizes. There is an inverse relationship between blocksize and the
resulting StoreFile indexes (i.e., if the blocksize is doubled then the
resulting indexes should be roughly halved). Please see <a
href="http://hbase.apache.org/book/perf.schema.html#schema.cf.blocksize">HBase
documentation</a>.
-
- <b>bloomFilter</b> (String): Bloom Filters can be enabled
per-ColumnFamily. We use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW
| ROWCOL)</code> to enable blooms per Column Family. Default = NONE for no
bloom filters. If ROW, the hash of the row will be added to the bloom on each
insert. If ROWCOL, the hash of the row + column family name + column family
qualifier will be added to the bloom on each key insert. Please see <a
href="http://hbase.apache.org/book/perf.schema.html#schema.bloom">HBase
documentation</a>.
-
- <b>maxVersions</b> (Integer): The maximum number of row versions to store
is configured per column family via <code>HColumnDescriptor</code>. The default
for max versions is <b>3</b>. This is an important parameter because HBase does
not overwrite row values, but rather stores different values per row by time
(and qualifier). Excess versions are removed during major compaction's. The
number of max versions may need to be increased or decreased depending on
application needs. Please see <a
href="http://hbase.apache.org/book/schema.versions.html">HBase
documentation</a>.
-
- <b>timeToLive</b> (Integer): ColumnFamilies can set a TTL length in
seconds, and HBase will automatically delete rows once the expiration time is
reached. This applies to all versions of a row - even the current one. The TTL
time encoded in the HBase for the row is specified in UTC. Please see <a
href="https://hbase.apache.org/book/ttl.html">HBase documentation</a>.
-
- <b>inMemory</b> (Boolean): ColumnFamilies can optionally be defined as
in-memory. Data is still persisted to disk, just like any other ColumnFamily.
In-memory blocks have the highest priority in the Block Cache, but it is not a
guarantee that the entire table will be in memory. Please see <a
href="http://hbase.apache.org/book/perf.schema.html#cf.in.memory">HBase
documentation</a>.
-
The class element where we specify of persistent fields which values should
map to. This contains;
-1. a parameter containing the Persistent class name e.g.
<b>org.apache.gora.examples.generated.Employee</b>,
-
-2. a parameter containing the keyClass e.g. <b>java.lang.String</b> which
specifies the keys which map to the field values,
+1. a parameter containing the Persistent class <b>name</b> e.g.
<code>org.apache.gora.examples.generated.Employee</code>,
-3. a parameter containing the Table name e.g. <b>Employee</b> which matches to
the above Table definition,
+2. a parameter containing the <b>keyClass</b> e.g.
<code>java.lang.String</code> which specifies the keys which map to the field
values,
-4. finally nested child element(s) mapping fields which are to be persisted
into HBase. These fields need to be configured such that they receive;
+3. a parameter containing the <b>Table name</b> e.g. <code>Employee</code>,
- a parameter containing the <b>name</b> e.g. (name, dateOfBirth, ssn and
salary respectively),
+4. finally nested child element(s) mapping fields which are to be persisted
into Solr. <b>We must provide a primary key for each object that we wish to
persist into Solr.</b> Additional object fields need to be configured such that
they receive;
- a parameter containing the column <b>family</b> to which they belong e.g.
(all info in this case),
+ a parameter containing the <b>name</b> e.g. (name, dateOfBirth, ssn,
salary, boss and webpage respectively),
+
+ a parameter containing the <b>column family</b> to which they belong e.g.
(all info in this case),
+
+##Solr Schema.xml
+
+<code>schema.xml</code> is an essential aspect of defining a storage and query
model for your Solr data.
+
+The Solr community maintain their own documentation relating to schema.xml,
this can be found at
[http://wiki.apache.org/solr/SchemaXml](http://wiki.apache.org/solr/SchemaXml).
+
+ <schema name="testexample" version="1.5">
+
+ <fields>
+
+ <!-- Common Fields -->
+ <field name="_version_" type="long" indexed="true" stored="true"/>
+
+ <!-- Employee Fields -->
+ <field name="ssn" type="string" indexed="true" stored="true"
required="true" multiValued="false" />
+ <field name="name" type="string" indexed="true" stored="true" />
+ <field name="dateOfBirth" type="long" stored="true" />
+ <field name="salary" type="int" stored="true" />
+ <field name="boss" type="binary" stored="true" />
+ <field name="webpage" type="binary" stored="true" />
+
+ </fields>
+
+ <uniqueKey>ssn</uniqueKey>
+
+ <types>
+
+ <fieldType name="string" class="solr.StrField" sortMissingLast="true"
/>
+ <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
positionIncrementGap="0"/>
+ <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
positionIncrementGap="0"/>
+ <fieldtype name="binary" class="solr.BinaryField"/>
+
+ </types>
+
+ </schema>
+
+##Solr solrconfig.xml
+
+Similar to <code>schema.xml</code> above, <code>solrconfig.xml</code>
documentation is also maintained by the Solr community.
+
+Please see an example configuration below but also please refer to
[http://wiki.apache.org/solr/SolrConfigXml](http://wiki.apache.org/solr/SolrConfigXml).
+
+ <config>
+ <luceneMatchVersion>LUCENE_40</luceneMatchVersion>
+ <dataDir>${solr.data.dir:}</dataDir>
+ <directoryFactory name="DirectoryFactory"
+
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
+ <codecFactory class="solr.SchemaCodecFactory"/>
+ <schemaFactory class="ClassicIndexSchemaFactory"/>
+ <indexConfig>
+ <lockType>${solr.lock.type:native}</lockType>
+ </indexConfig>
+
+ <jmx />
+
+ <updateHandler class="solr.DirectUpdateHandler2">
+ <updateLog>
+ <str name="dir">${solr.ulog.dir:}</str>
+ </updateLog>
+ </updateHandler>
+
+ <query>
+ <maxBooleanClauses>1024</maxBooleanClauses>
+ <filterCache class="solr.FastLRUCache"
+ size="512"
+ initialSize="512"
+ autowarmCount="0"/>
+ <queryResultCache class="solr.LRUCache"
+ size="512"
+ initialSize="512"
+ autowarmCount="0"/>
+ <documentCache class="solr.LRUCache"
+ size="512"
+ initialSize="512"
+ autowarmCount="0"/>
+ <enableLazyFieldLoading>true</enableLazyFieldLoading>
+ <queryResultWindowSize>20</queryResultWindowSize>
+ <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
+ <listener event="newSearcher" class="solr.QuerySenderListener">
+ <arr name="queries">
+ </arr>
+ </listener>
+ <listener event="firstSearcher" class="solr.QuerySenderListener">
+ <arr name="queries">
+ <lst>
+ <str name="q">static firstSearcher warming in
solrconfig.xml</str>
+ </lst>
+ </arr>
+ </listener>
+ <useColdSearcher>false</useColdSearcher>
+ <maxWarmingSearchers>2</maxWarmingSearchers>
+ </query>
+
+ <requestDispatcher handleSelect="false" >
+ <requestParsers enableRemoteStreaming="true"
+ multipartUploadLimitInKB="2048000"
+ formdataUploadLimitInKB="2048"
+ addHttpRequestToContext="false"/>
+ <httpCaching never304="true" />
+ </requestDispatcher>
+
+ <requestHandler name="/select" class="solr.SearchHandler">
+ <lst name="defaults">
+ <str name="echoParams">explicit</str>
+ <int name="rows">10</int>
+ <str name="df">ssn</str>
+ </lst>
+ </requestHandler>
+
+ <requestHandler name="/query" class="solr.SearchHandler">
+ <lst name="defaults">
+ <str name="echoParams">explicit</str>
+ <str name="wt">json</str>
+ <str name="indent">true</str>
+ <str name="df">ssn</str>
+ </lst>
+ </requestHandler>
+
+ <requestHandler name="/get" class="solr.RealTimeGetHandler">
+ <lst name="defaults">
+ <str name="omitHeader">true</str>
+ </lst>
+ </requestHandler>
+
+ <requestHandler name="/update" class="solr.UpdateRequestHandler">
+ </requestHandler>
+ </config>
- an optional parameter <b>qualifier</b>, which enables more granular
control over the data to be persisted into HBase.