zhuxiaoshang created FLINK-20538:
------------------------------------
Summary: sink.rolling-policy.file-size does not work in filesystem
connector
Key: FLINK-20538
URL: https://issues.apache.org/jira/browse/FLINK-20538
Project: Flink
Issue Type: Bug
Components: Connectors / FileSystem
Affects Versions: 1.11.1
Reporter: zhuxiaoshang
When I use sql filesystem connector to write data to hdfs,and set
sink.rolling-policy.file-size to 50MB.But seems not working, there are still
100MB+ size files.
My table ddl is :
{code:java}
CREATE TABLE cpc_bd_recall_log_hdfs (
log_timestamp BIGINT,
ip STRING,
`raw` STRING,
`day` STRING, `hour` STRING,`minute` STRING
) PARTITIONED BY (`day` , `hour` ,`minute`) WITH (
'connector'='filesystem',
'path'='hdfs://xxx/test.db/hdfs_test',
'format'='parquet',
'parquet.compression'='SNAPPY',
'sink.rolling-policy.file-size' = '50MB',
'sink.partition-commit.policy.kind' = 'success-file',
'sink.partition-commit.delay'='60s'
);
{code}
the hdfs files are:
{code:java}
0 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/_SUCCESS
-rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2500
-rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2501
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2499
-rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2500
-rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2501
-rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2502
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2500
-rw-r--r-- 3 hadoop hadoop 122.2 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2501
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2500
-rw-r--r-- 3 hadoop hadoop 122.2 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2501
-rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2499
-rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2500
-rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2500
-rw-r--r-- 3 hadoop hadoop 122.1 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2501
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2498
-rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2499
-rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2501
-rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2502
-rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2500
-rw-r--r-- 3 hadoop hadoop 122.5 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2501
-rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2500
-rw-r--r-- 3 hadoop hadoop 121.7 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2501
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2501
-rw-r--r-- 3 hadoop hadoop 121.7 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2502
-rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2499
-rw-r--r-- 3 hadoop hadoop 121.6 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2500
-rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2500
-rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2501
-rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2499
-rw-r--r-- 3 hadoop hadoop 122.1 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2500
-rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2499
-rw-r--r-- 3 hadoop hadoop 121.8 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2500
-rw-r--r-- 3 hadoop hadoop 31.8 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2499
-rw-r--r-- 3 hadoop hadoop 121.5 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2500
-rw-r--r-- 3 hadoop hadoop 31.6 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2500
-rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2501
-rw-r--r-- 3 hadoop hadoop 31.7 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2501
-rw-r--r-- 3 hadoop hadoop 122.0 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2502
-rw-r--r-- 3 hadoop hadoop 31.9 M 2020-12-04 14:55
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2501
-rw-r--r-- 3 hadoop hadoop 121.9 M 2020-12-04 14:56
hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2502
{code}
However,when I dig into source code,when writing element to bucket it'll invoke
`shouldRollOnEvent` in TableRollingPolicy.
I don't understand how can this happen?Is a BUG or somewhere I get it wrong.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)