This is an automated email from the ASF dual-hosted git repository. shuber pushed a commit to branch UNOMI-240-document-profile-import-export in repository https://gitbox.apache.org/repos/asf/unomi.git
commit ee8c9ae3d7968da89894d14e48cda2eae5943d77 Author: Serge Huber <[email protected]> AuthorDate: Thu Aug 22 21:10:21 2019 +0200 UNOMI-240 Document profile import/export Signed-off-by: Serge Huber <[email protected]> --- manual/src/main/asciidoc/index.adoc | 4 + .../src/main/asciidoc/profile-import-export.adoc | 239 +++++++++++++++++++++ 2 files changed, 243 insertions(+) diff --git a/manual/src/main/asciidoc/index.adoc b/manual/src/main/asciidoc/index.adoc index f611a7a..31a5972 100644 --- a/manual/src/main/asciidoc/index.adoc +++ b/manual/src/main/asciidoc/index.adoc @@ -48,6 +48,10 @@ include::useful-unomi-urls.adoc[] include::how-profile-tracking-works.adoc[] +== Profile import & export + +include::profile-import-export.adoc[] + == Consent management include::consent-api.adoc[] diff --git a/manual/src/main/asciidoc/profile-import-export.adoc b/manual/src/main/asciidoc/profile-import-export.adoc new file mode 100644 index 0000000..67db681 --- /dev/null +++ b/manual/src/main/asciidoc/profile-import-export.adoc @@ -0,0 +1,239 @@ +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +The profile import and export feature in Apache Unomi is based on configurations and consumes or produces CSV files that +contain the profiles to import and export. + +=== Importing profiles + +Only `ftp`, `sftp`, `ftps` and ``file` are supported in the source path. For example: + + file:///tmp/?fileName=profiles.csv&move=.done&consumer.delay=25s + +Where: + +- `fileName` Can be a pattern, for example `include=.*.csv` instead of `fileName=...` to consume all CSV files. +By default the processed files are moved to `.camel` folder you can change it using the `move` option. +- `consumer.delay` Is the frequency of polling in milliseconds. For example, 20000 milliseconds is 20 seconds. This +frequency can also be 20s. Other possible format are: 2h30m10s = 2 hours and 30 minutes and 10 seconds. + +See http://camel.apache.org/ftp.html and http://camel.apache.org/file2.html to build more complex source path. Also be +careful with FTP configuration as most servers won't accept clear text FTP anymore and you should use SFTP or FTPS +instead, but they are a little more difficult to configure properly. It is recommended to test the connection with an +FTP client first before setting up these source paths to make sure everything is working properly. Also on FTP +connections most servers require PASSIVE mode so you can specify that in the path using the `passiveMode=true` parameter. + +Here are some examples of FTPS and SFTP source paths: + + sftp://USER@HOST/PATH?password=PASSWORD&include=.*.csv + ftps://USER@HOST?password=PASSWORD&fileName=profiles.csv&passiveMode=true + +Where: + +- `USER` is the user name of the SFTP/FTPS user account to login with +- `PASSWORD` is the password for the user account +- `HOST` is the host name (or IP address) of the host server that provides the SFTP / FTPS server +- `PATH` is a path to a directory inside the user's account where the file will be retrieved. + +==== Import API + +Apache Unomi provides REST endpoints to manage import configurations: + + GET /cxs/importConfiguration + GET /cxs/importConfiguration/{configId} + POST /cxs/importConfiguration + DELETE /cxs/importConfiguration/{configId} + +This is how a oneshot import configuration looks like: + + { + "itemId": "importConfigId", + "itemType": "importConfig", + "name": "Import Config Sample", + "description": "Sample description", + "configType": "oneshot", //Config type can be 'oneshot' or 'recurrent' + "properties": { + "mapping": { + "email": 0, //<Apache Unomi Property Id> : <Column Index In the CSV> + "firstName": 2, + ... + } + }, + "columnSeparator": ",", //Character used to separate columns + "lineSeparator": "\\n", //Character used to separate lines (\n or \r) + "multiValueSeparator": ";", //Character used to separate values for multivalued columns + "multiValueDelimiter": "[]", //Character used to wrap values for multivalued columns + "status": "SUCCESS", //Status of last execution + "executions": [ //(RETURN) Last executions by default only last 5 are returned + ... + ], + "mergingProperty": "email", //Apache Unomi Property Id used to check duplicates + "overwriteExistingProfiles": true, //Overwrite profiles that have duplicates + "propertiesToOverwrite": "firstName, lastName, ...", //If last is set to true, which property to overwrite, 'null' means overwrite all + "hasHeader": true, //CSV file to import contains a header line + "hasDeleteColumn": false //CSV file to import doesn't contain a TO DELETE column (if it contains, will be the last column) + } + +A recurrent import configuration is similar to the previous one with some specific information to add to the JSON like: + + { + ... + "configType": "recurrent", + "properties": { + "source": "ftp://USER@SERVER[:PORT]/PATH?password=xxx&fileName=profiles.csv&move=.done&consumer.delay=20000", + // Only 'ftp', 'sftp', 'ftps' and 'file' are supported in the 'source' path + // eg. file:///tmp/?fileName=profiles.csv&move=.done&consumer.delay=25s + // 'fileName' can be a pattern eg 'include=.*.csv' instead of 'fileName=...' to consume all CSV files + // By default the processed files are moved to '.camel' folder you can change it using the 'move' option + // 'consumer.delay' is the frequency of polling. '20000' (in milliseconds) means 20 seconds. Can be also '20s' + // Other possible format are: '2h30m10s' = 2 hours and 30 minutes and 10 seconds + "mapping": { + ... + } + }, + ... + "active": true, //If true the polling will start according to the 'source' configured above + ... + } + + +=== Exporting profiles + +Only `ftp`, `sftp`, `ftps` and `file are supported in the source path. For example: + + file:///tmp/?fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append) + sftp://USER@HOST/PATH?password=PASSWORD&binary=true&fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append + ftps://USER@HOST?password=PASSWORD&binary=true&fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append&passiveMode=true + +As you can see in the examples above, you can inject variables in the produced file name `${date:now:yyyyMMddHHmm}` is +the current date formatted with the pattern `yyyyMMddHHmm`. `fileExist` option put as `Append` will tell the file writer +to append to the same file for each execution of the export configuration. You cam omit this option to write a profile +per file. + +See http://camel.apache.org/ftp.html and http://camel.apache.org/file2.html to build more complex destination path. + +==== Export API + +Apache Unomi provides REST endpoints to manage export configurations: + + GET /cxs/exportConfiguration + GET /cxs/exportConfiguration/{configId} + POST /cxs/exportConfiguration + DELETE /cxs/exportConfiguration/{configId} + +This is how a oneshot export configuration looks like: + + { + "itemId": "exportConfigId", + "itemType": "exportConfig", + "name": "Export configuration sample", + "description": "Sample description", + "configType": "oneshot", + "properties": { + "period": "2m30s", + "segment": "contacts", + "mapping": { + "0": "firstName", + "1": "lastName", + ... + } + }, + "columnSeparator": ",", + "lineSeparator": "\\n", + "multiValueSeparator": ";", + "multiValueDelimiter": "[]", + "status": "RUNNING", + "executions": [ + ... + ] + } + +A recurrent export configuration is similar to the previous one with some specific information to add to the JSON like: + + { + ... + "configType": "recurrent", + "properties": { + "destination": "sftp://USER@SERVER:PORT/PATH?password=XXX&fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append", + "period": "2m30s", //Same as 'consumer.delay' option in the import source path + "segment": "contacts", //Segment ID to use to collect profiles to export + "mapping": { + ... + } + }, + ... + "active": true, //If true the configuration will start polling upon save until the user deactivate it + ... + } + +=== Configuration in details + +First configuration you need to change would be the configuration type of your import / export feature (code name +`router) in the `etc/unomi.custom.system.properties` file (creating it if necessary): + + #Configuration Type values {'nobroker', 'kafka'} + org.apache.unomi.router.config.type=nobroker + +By default the feature is configured (as above) to use no external broker, which means to handle import/export data it +will use in memory queues (In the same JVM as Apache Unomi). If you are clustering Apache Unomi, most important thing +to know about this type of configuration is that each Apache Unomi will handle the import/export task by itself without +the help of other nodes (No Load-Distribution). + +Changing this property to kafka means you have to provide the Apache Kafka configuration, and in the opposite of the +nobroker option import/export data will be handled using an external broker (Apache Kafka), this will lighten the burden +on the Apache Unomi machines. + +You may use several Apache Kafka instance, 1 per N Apache Unomi nodes for better application scaling. + +To enable using Apache Kafka you need to configure the feature as follows: + + #Configuration Type values {'nobroker', 'kafka'} + org.apache.unomi.router.config.type=kafka + + #Uncomment and update Kafka settings to use Kafka as a broker + + #Kafka + org.apache.unomi.router.kafka.host=localhost + org.apache.unomi.router.kafka.port=9092 + org.apache.unomi.router.kafka.import.topic=import-deposit + org.apache.unomi.router.kafka.export.topic=export-deposit + org.apache.unomi.router.kafka.import.groupId=unomi-import-group + org.apache.unomi.router.kafka.export.groupId=unomi-import-group + org.apache.unomi.router.kafka.consumerCount=10 + org.apache.unomi.router.kafka.autoCommit=true + +There is couple of properties you may want to change to fit your needs, one of the is the import.oneshot.uploadDir which +will tell Apache Unomi where to store temporarily the CSV files to import in Oneshot mode, it's a technical property +to allow the choose of the convenient disk space where to store files to import. It defaults to the following path +under the Apache Unomi Karaf (It is recommended to change the path to a more convenient one). + + #Import One Shot upload directory + org.apache.unomi.router.import.oneshot.uploadDir=${karaf.data}/tmp/unomi_oneshot_import_configs/ + +Next two properties are max sizes for executions history and error reports, for some reason you dont want Apache Unomi +to report all the executions history and error reports generated by the executions of an import/export configuration. +To change this you have to change the default values of these properties. + + #Import/Export executions history size + org.apache.unomi.router.executionsHistory.size=5 + + #errors report size + org.apache.unomi.router.executions.error.report.size=200 + +Final one is about the allowed endpoints you can use when building the source or destionation path, as mentioned above +we can have a path of type `file`, `ftp`, `ftps`, `sftp`. You can make it less if you want to omit some endpoints (eg. +you don't want to permit the use of non secure FTP). + + #Allowed source endpoints + org.apache.unomi.router.config.allowedEndpoints=file,ftp,sftp,ftps +
