[ 
https://issues.apache.org/jira/browse/SAMOA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996192#comment-14996192
 ] 

ASF GitHub Bot commented on SAMOA-47:
-------------------------------------

Github user gdfm commented on a diff in the pull request:

    https://github.com/apache/incubator-samoa/pull/40#discussion_r44249782
  
    --- Diff: 
samoa-instances/src/main/java/org/apache/samoa/instances/AvroLoader.java ---
    @@ -0,0 +1,285 @@
    +package org.apache.samoa.instances;
    +
    +/*
    + * #%L
    + * SAMOA
    + * %%
    + * Copyright (C) 2014 - 2015 Apache Software Foundation
    + * %%
    + * Licensed under the Apache License, Version 2.0 (the "License");
    + * you may not use this file except in compliance with the License.
    + * You may obtain a copy of the License at
    + * 
    + *      http://www.apache.org/licenses/LICENSE-2.0
    + * 
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + * #L%
    + */
    +
    +
    +import java.io.InputStream;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.avro.Schema;
    +import org.apache.avro.Schema.Field;
    +import org.apache.avro.generic.GenericData.EnumSymbol;
    +import org.apache.avro.generic.GenericRecord;
    +import org.apache.avro.io.DatumReader;
    +
    +/**
    + * Load Data from Avro Stream and parse to corresponding Dense & Parse 
Instances
    + * Abstract Class: Subclass this class for different types of Avro 
Encodings
    + * 
    + *
    + */
    +public abstract class AvroLoader implements Loader {
    +
    +   private static final long serialVersionUID = 1L;
    +
    +   /** Representation of the Avro Schema for the Instances being read. 
Built from the first line in the data  */
    +   protected Schema schema = null;
    +
    +   /**  Meta-data of the Instance */
    +   protected InstanceInformation instanceInformation;
    +
    +   /** List of attributes in the data as read from the schema */
    +   protected List<Attribute> attributes;
    +
    +   /** This variable is to check if the data stored is Sparse or Dense */
    +   protected boolean isSparseData;
    +
    +   protected int classAttribute;
    +
    +   /** Datum Reader for Avro Data*/
    +   public DatumReader<GenericRecord> datumReader = null;
    +
    +   public AvroLoader(int classAttribute) {
    +           this.classAttribute = classAttribute;
    +           this.isSparseData = false;
    +   }
    +
    +   /** Intialize Avro Schema, Meta Data, InstanceInformation from Input 
Avro Stream */
    +   public abstract void initializeSchema(InputStream inputStream);
    +
    +   /** Read a single SAMOA Instance from Input Avro Stream */
    +   public abstract Instance readInstance();
    +   
    +   /**
    +    * Method to read Dense Instances from Avro File
    +    * @return Instance
    +    */
    +   protected Instance readInstanceDense(GenericRecord record)
    +   {
    +           Instance instance = new 
DenseInstance(this.instanceInformation.numAttributes() + 1);
    +           int numAttribute = 0;
    +
    +           for (Attribute attribute : attributes) {
    +                   Object value = record.get(attribute.name);
    +
    +                   boolean isNumeric = 
attributes.get(numAttribute).isNumeric();
    +                   boolean isNominal = 
attributes.get(numAttribute).isNominal();
    +
    +                   if(isNumeric)
    +                   {
    +                           if(value instanceof Double)     
    +                                   this.setDenseValue(instance, 
numAttribute, (double)value);
    +                           else if (value instanceof Long) 
    +                                   this.setDenseValue(instance, 
numAttribute, (long)value);
    +                           else if (value instanceof Integer)      
    +                                   this.setDenseValue(instance, 
numAttribute, (int)value);
    --- End diff --
    
    Shouldn't the check go from more specific (int) to more general (double)?


> Integrate Avro Streams with SAMOA
> ---------------------------------
>
>                 Key: SAMOA-47
>                 URL: https://issues.apache.org/jira/browse/SAMOA-47
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API, SAMOA-Instances
>            Reporter: jayadeepj
>            Priority: Minor
>              Labels: patch
>
> The current SAMOA readers can only support data streams in ARFF format. Hence 
> SAMOA as a distributed streaming machine learning framework is limited in 
> scope since end users may have to transform their data to ARFF . Apache Avro 
> is a data serialization system that handles data streams in compact binary 
> format and is typically used in conjunction with with Big Data eco-system 
> tools. Avro allows two encodings for the data: Binary & JSON. Hence an Avro 
> support may allow users with JSON data also to use SAMOA seamlessly.
> The GOAL is to build support for Avro Streams into SAMOA by adding Avro File 
> Stream Handler, Avro Loader to read records & transform to instances and  a 
> user option to switch between JSON/Binary encodings. The input format with 
> representation of meta-data for both JSON/Binary data to be finalized along 
> with build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to