abdullah alamoudi has uploaded a new change for review. https://asterix-gerrit.ics.uci.edu/802
Change subject: Add List of Supported Adapters to Doc ...................................................................... Add List of Supported Adapters to Doc Change-Id: I2bb98477e144e78e9983d33f9dd2f89a547aeccf --- M asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java 2 files changed, 59 insertions(+), 1 deletion(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/02/802/1 diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md b/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md index d5281cb..e919bd0 100644 --- a/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md +++ b/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md @@ -23,6 +23,7 @@ * [Introduction](#Introduction) * [Adapter for an External Dataset](#IntroductionAdapterForAnExternalDataset) +* [Builtin Adapters](#BuiltinAdapters) * [Creating an External Dataset](#IntroductionCreatingAnExternalDataset) * [Writing Queries against an External Dataset](#WritingQueriesAgainstAnExternalDataset) * [Building Indexes over External Datasets](#BuildingIndexesOverExternalDatasets) @@ -35,8 +36,62 @@ ### <a id="IntroductionAdapterForAnExternalDataset">Adapter for an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ### External data is accessed using wrappers (adapters in AsterixDB) that abstract away the mechanism of connecting with an external service, receiving its data and transforming the data into ADM records that are understood by AsterixDB. AsterixDB comes with built-in adapters for common storage systems such as HDFS or the local file system. -### <a id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ### +### <a id="BuiltinAdapters">Builtin Adapters</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ### +AsterixDB offers a set of builtin adapters that can be used to query external data or for loading data into an internal dataset using a load statement or a data feed. Each adapter requires specifying the format of the data in order to be able to parse records correctly. Using adapters with feeds, the parameter output-type must also be specified. +Following is a listing of existing built-in adapters and their configuration parameters: +<ol> + <li>localfs: used for reading data stored in a local filesystem in one or more of the node controllers + <ul> + <li>path: A fully qualified path of the form host://<absolute path>. Comma separated list if there are multiple directories or files</li> + <li>expression: A regular expression to match and filter against file names</li> + </ul> + </li> + <li>hdfs: used for reading data stored in an HDFS instance. + <ul> + <li>path: A fully qualified path of the form host://<absolute_path>. Comma separated list if there are multiple directories or files</li> + <li>expression: A regular expression to match and filter against file names</li> + <li>input-format: A fully qualified name or an alias for a class of HDFS input format</li> + <li>hdfs: The HDFS name node URL</li> + </ul> + </li> + <li>socket: used for listening to connections that sends data streams through one or more sockets. + <ul> + <li>sockets: comma separated list of sockets to listen to</li> + <li>address-type: either IP if the list uses IP addresses, or NC if the list uses NC names</li> + </ul> + </li> + <li>socket_client: used for connecting to one or more socket and reading data streams. + <ul> + <li>sockets: comma separated list of sockets to connect to</li> + </ul> + </li> + <li>twitter_push: used for establishing a connection and subscribing to a twitter feed. + <ul> + <li>consumer.key: access parameter provided by twitter OAuth</li> + <li>consumer.secret: access parameter provided by twitter OAuth</li> + <li>access.token: access parameter provided by twitter OAuth</li> + <li>access.token.secret: access parameter provided by twitter OAuth</li> + </ul> + </li> + <li>twitter_pull: used for polling a twitter feed for tweets based on a configurable frequency + <ul> + <li>consumer.key: access parameter provided by twitter OAuth</li> + <li>consumer.secret: access parameter provided by twitter OAuth</li> + <li>access.token: access parameter provided by twitter OAuth</li> + <li>access.token.secret: access parameter provided by twitter OAuth</li> + <li>query: twitter query string</li> + <li>interval: poll interval in seconds</li> + </ul> + </li> + <li>rss: used for reading RSS feed + <ul> + <li>url: a comma separated list of RSS urls</li> + </ul> + </li> +</ol> + +### <a id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ### As an example we consider the Lineitem dataset from the [TPCH schema](http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSTPCHLinkedData/tpch.sql). We assume that you have successfully created an AsterixDB instance following the instructions at [Installing AsterixDB Using Managix](../install.html). _For constructing an example, we assume a single machine setup.._ diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java index 0f24f91..bd50c39 100644 --- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java +++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java @@ -29,6 +29,7 @@ import org.apache.asterix.external.input.record.reader.RecordWithPKTestReaderFactory; import org.apache.asterix.external.input.record.reader.kv.KVReaderFactory; import org.apache.asterix.external.input.record.reader.kv.KVTestReaderFactory; +import org.apache.asterix.external.input.record.reader.rss.RSSRecordReaderFactory; import org.apache.asterix.external.input.record.reader.stream.StreamRecordReaderFactory; import org.apache.asterix.external.input.record.reader.twitter.TwitterRecordReaderFactory; import org.apache.asterix.external.input.stream.factory.LocalFSInputStreamFactory; @@ -108,6 +109,8 @@ return new StreamRecordReaderFactory(new SocketServerInputStreamFactory()); case ExternalDataConstants.STREAM_SOCKET_CLIENT: return new StreamRecordReaderFactory(new SocketClientInputStreamFactory()); + case ExternalDataConstants.READER_RSS: + return new RSSRecordReaderFactory(); default: throw new AsterixException("unknown record reader factory: " + reader); } -- To view, visit https://asterix-gerrit.ics.uci.edu/802 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2bb98477e144e78e9983d33f9dd2f89a547aeccf Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: abdullah alamoudi <bamou...@gmail.com>