[
https://issues.apache.org/jira/browse/APEXMALHAR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342995#comment-15342995
]
ASF GitHub Bot commented on APEXMALHAR-2116:
--------------------------------------------
Github user amberarrow commented on a diff in the pull request:
https://github.com/apache/apex-malhar/pull/326#discussion_r67969125
--- Diff:
library/src/main/java/org/apache/apex/malhar/lib/fs/FSRecordReader.java ---
@@ -0,0 +1,175 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.apex.malhar.lib.fs;
+
+import java.io.IOException;
+
+import org.apache.commons.beanutils.ConvertUtils;
+import org.apache.commons.beanutils.converters.AbstractConverter;
+import org.apache.hadoop.fs.FSDataInputStream;
+
+import com.datatorrent.api.Context.OperatorContext;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.lib.io.block.BlockMetadata;
+import com.datatorrent.lib.io.block.FSSliceReader;
+import com.datatorrent.lib.io.block.ReaderContext;
+
+/**
+ * This operator can be used for reading records/tuples from Filesystem
+ * in parallel (without ordering guarantees between tuples).
+ * Records can be delimited (e.g. newline) or fixed with records.
--- End diff --
fixed with => fixed width
> File Record reader module
> -------------------------
>
> Key: APEXMALHAR-2116
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2116
> Project: Apache Apex Malhar
> Issue Type: New Feature
> Reporter: Yogi Devendra
> Assignee: Yogi Devendra
>
> This will be useful for the usecases which involves reading from files "line
> by line" in parallel and emit each line as seperate tuple.
> Proposal is to have new Module which would allow users to monitor
> directories, read files and emit data records(tuple). Records are based on
> record separator (e.g. newline) or fixed size (no of bytes).
> Plan is as follows:
> 1. New operator FileRecordReader which will extend BlockReader.
> 2. This operator will have configuration option to select mode for
> FIXED_LENGTH, SEPARATOR_BASED recors.
> 3. Using appropriate ReaderContext based on mode.
> 4. New module FileRecordReaderModule which wraps (FileSplitter (existing) +
> FileRecordReader operator)
> Reason for having different operator than BlockReader is because output port
> signature is different than BlockReader.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)