[
https://issues.apache.org/jira/browse/SAMOA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996192#comment-14996192
]
ASF GitHub Bot commented on SAMOA-47:
-------------------------------------
Github user gdfm commented on a diff in the pull request:
https://github.com/apache/incubator-samoa/pull/40#discussion_r44249782
--- Diff:
samoa-instances/src/main/java/org/apache/samoa/instances/AvroLoader.java ---
@@ -0,0 +1,285 @@
+package org.apache.samoa.instances;
+
+/*
+ * #%L
+ * SAMOA
+ * %%
+ * Copyright (C) 2014 - 2015 Apache Software Foundation
+ * %%
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ * #L%
+ */
+
+
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.avro.Schema;
+import org.apache.avro.Schema.Field;
+import org.apache.avro.generic.GenericData.EnumSymbol;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+
+/**
+ * Load Data from Avro Stream and parse to corresponding Dense & Parse
Instances
+ * Abstract Class: Subclass this class for different types of Avro
Encodings
+ *
+ *
+ */
+public abstract class AvroLoader implements Loader {
+
+ private static final long serialVersionUID = 1L;
+
+ /** Representation of the Avro Schema for the Instances being read.
Built from the first line in the data */
+ protected Schema schema = null;
+
+ /** Meta-data of the Instance */
+ protected InstanceInformation instanceInformation;
+
+ /** List of attributes in the data as read from the schema */
+ protected List<Attribute> attributes;
+
+ /** This variable is to check if the data stored is Sparse or Dense */
+ protected boolean isSparseData;
+
+ protected int classAttribute;
+
+ /** Datum Reader for Avro Data*/
+ public DatumReader<GenericRecord> datumReader = null;
+
+ public AvroLoader(int classAttribute) {
+ this.classAttribute = classAttribute;
+ this.isSparseData = false;
+ }
+
+ /** Intialize Avro Schema, Meta Data, InstanceInformation from Input
Avro Stream */
+ public abstract void initializeSchema(InputStream inputStream);
+
+ /** Read a single SAMOA Instance from Input Avro Stream */
+ public abstract Instance readInstance();
+
+ /**
+ * Method to read Dense Instances from Avro File
+ * @return Instance
+ */
+ protected Instance readInstanceDense(GenericRecord record)
+ {
+ Instance instance = new
DenseInstance(this.instanceInformation.numAttributes() + 1);
+ int numAttribute = 0;
+
+ for (Attribute attribute : attributes) {
+ Object value = record.get(attribute.name);
+
+ boolean isNumeric =
attributes.get(numAttribute).isNumeric();
+ boolean isNominal =
attributes.get(numAttribute).isNominal();
+
+ if(isNumeric)
+ {
+ if(value instanceof Double)
+ this.setDenseValue(instance,
numAttribute, (double)value);
+ else if (value instanceof Long)
+ this.setDenseValue(instance,
numAttribute, (long)value);
+ else if (value instanceof Integer)
+ this.setDenseValue(instance,
numAttribute, (int)value);
--- End diff --
Shouldn't the check go from more specific (int) to more general (double)?
> Integrate Avro Streams with SAMOA
> ---------------------------------
>
> Key: SAMOA-47
> URL: https://issues.apache.org/jira/browse/SAMOA-47
> Project: SAMOA
> Issue Type: New Feature
> Components: SAMOA-API, SAMOA-Instances
> Reporter: jayadeepj
> Priority: Minor
> Labels: patch
>
> The current SAMOA readers can only support data streams in ARFF format. Hence
> SAMOA as a distributed streaming machine learning framework is limited in
> scope since end users may have to transform their data to ARFF . Apache Avro
> is a data serialization system that handles data streams in compact binary
> format and is typically used in conjunction with with Big Data eco-system
> tools. Avro allows two encodings for the data: Binary & JSON. Hence an Avro
> support may allow users with JSON data also to use SAMOA seamlessly.
> The GOAL is to build support for Avro Streams into SAMOA by adding Avro File
> Stream Handler, Avro Loader to read records & transform to instances and a
> user option to switch between JSON/Binary encodings. The input format with
> representation of meta-data for both JSON/Binary data to be finalized along
> with build.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)