[ 
https://issues.apache.org/jira/browse/SAMOA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629043#comment-14629043
 ] 

ASF GitHub Bot commented on SAMOA-40:
-------------------------------------

GitHub user karande opened a pull request:

    https://github.com/apache/incubator-samoa/pull/32

    SAMOA-40: Add Kafka stream reader modules to consume data from Kafka …

    …framework
    
    Apache SAMOA is designed to process streaming data and develop
    streaming machine learning
    algorithm. Currently, SAMOA framework supports stream data read from
    Arff files only.
    Thus, while using SAMOA as a streaming machine learning component in
    real time use-cases,
    writing and reading data from files is slow and inefficient.
    
    A single Kafka broker can handle hundreds of megabytes of reads and
    writes per second
    from thousands of clients. The ability to read data directly from
    Apache Kafka into SAMOA will
    not only improve performance but also make SAMOA pluggable to many real
    time machine
    learning use cases such as Internet of Things(IoT).
    
    GOAL:
    Add code that enables SAMOA to read data from Apache Kafka as a stream
    data.
    Kafka stream reader supports following different options for streaming:
    
    a) Topic selection - Kafka topic to read data
    b) Partition selection - Kafka partition to read data
    c) Batching - Number of data instances read from Kafka in one read
    request to Kafka
    d) Configuration options - Kafka port number, seed information, time
    delay between two read requests
    
    Components:
    KafkaReader - Consists for APIs to read data from Kafka
    KafkaStream - Stream source for SAMOA providing data read from Kafka
    Dependencies for Kafka are added in pom.xml for in samoa-api component.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/karande/incubator-samoa master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-samoa/pull/32.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #32
    
----
commit 768306b2e832671f37fb0b1d1a009fcb07807ad3
Author: Vishal Karande <[email protected]>
Date:   2015-07-16T01:13:51Z

    SAMOA-40: Add Kafka stream reader modules to consume data from Kafka 
framework
    
    Apache SAMOA is designed to process streaming data and develop
    streaming machine learning
    algorithm. Currently, SAMOA framework supports stream data read from
    Arff files only.
    Thus, while using SAMOA as a streaming machine learning component in
    real time use-cases,
    writing and reading data from files is slow and inefficient.
    
    A single Kafka broker can handle hundreds of megabytes of reads and
    writes per second
    from thousands of clients. The ability to read data directly from
    Apache Kafka into SAMOA will
    not only improve performance but also make SAMOA pluggable to many real
    time machine
    learning use cases such as Internet of Things(IoT).
    
    GOAL:
    Add code that enables SAMOA to read data from Apache Kafka as a stream
    data.
    Kafka stream reader supports following different options for streaming:
    
    a) Topic selection - Kafka topic to read data
    b) Partition selection - Kafka partition to read data
    c) Batching - Number of data instances read from Kafka in one read
    request to Kafka
    d) Configuration options - Kafka port number, seed information, time
    delay between two read requests
    
    Components:
    KafkaReader - Consists for APIs to read data from Kafka
    KafkaStream - Stream source for SAMOA providing data read from Kafka
    Dependencies for Kafka are added in pom.xml for in samoa-api component.

----


> Add Kafka stream reader modules to consume data from Kafka framework
> --------------------------------------------------------------------
>
>                 Key: SAMOA-40
>                 URL: https://issues.apache.org/jira/browse/SAMOA-40
>             Project: SAMOA
>          Issue Type: Task
>          Components: Infrastructure, SAMOA-API
>         Environment: OS X Version 10.10.3
>            Reporter: Vishal Karande
>            Priority: Minor
>              Labels: features
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Apache SAMOA is designed to process streaming data and develop streaming 
> machine learning
> algorithm. Currently, SAMOA framework supports stream data read from Arff 
> files only.
> Thus, while using SAMOA as a streaming machine learning component in real 
> time use-cases,
> writing and reading data from files is slow and inefficient.
> A single Kafka broker can handle hundreds of megabytes of reads and writes 
> per second 
> from thousands of clients. The ability to read data directly from Apache 
> Kafka into SAMOA will 
> not only improve performance but also make SAMOA pluggable to many real time 
> machine
> learning use cases such as Internet of Things(IoT).
> GOAL:
> Add code that enables SAMOA to read data from Apache Kafka as a stream data.
> Kafka stream reader supports following different options for streaming:
> a) Topic selection - Kafka topic to read data
> b) Partition selection - Kafka partition to read data
> c) Batching - Number of data instances read from Kafka in one read request to 
> Kafka
> d) Configuration options - Kafka port number, seed information, time delay 
> between two read requests
> Components:
> KafkaReader - Consists for APIs to read data from Kafka
> KafkaStream - Stream source for SAMOA providing data read from Kafka
> Dependencies for Kafka are added in pom.xml for in samoa-api component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to