[
https://issues.apache.org/jira/browse/FLINK-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063258#comment-14063258
]
Chesnay Schepler commented on FLINK-671:
----------------------------------------
Finally found the issue: long strings (> 4000 bytes) are not properly read on
the java side. when i filter these out the WC runs fine.
I never checked how much data java actually reads, and only used a single call
to read. since at that time at most 4k bytes are present (size of the buffer
behind standard pipes), it only reads those and forgets about the rest. the
next read call then reads data that wasn't supposed to be there, generally
breaking the program.
> Python interface for new API (Map/Reduce)
> -----------------------------------------
>
> Key: FLINK-671
> URL: https://issues.apache.org/jira/browse/FLINK-671
> Project: Flink
> Issue Type: Improvement
> Components: Python API
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Labels: github-import
> Fix For: pre-apache
>
> Attachments: pull-request-671-9139035883911146960.patch
>
>
> ([#615|https://github.com/stratosphere/stratosphere/issues/615] |
> [FLINK-615|https://issues.apache.org/jira/browse/FLINK-615])
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/671
> Created by: [zentol|https://github.com/zentol]
> Labels: enhancement, java api,
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Apr 09 20:52:06 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)