[ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=268024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268024
 ]

ASF GitHub Bot logged work on BEAM-7545:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jun/19 22:00
            Start Date: 26/Jun/19 22:00
    Worklog Time Spent: 10m 
      Work Description: akedin commented on pull request #8951: [BEAM-7545] 
Adding RowCount to TextTable
URL: https://github.com/apache/beam/pull/8951#discussion_r297884214
 
 

 ##########
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java
 ##########
 @@ -59,6 +68,30 @@ public String getFilePattern() {
     return filePattern;
   }
 
+  @Override
+  public BeamRowCountStatistics getRowCount(PipelineOptions options) {
+    if (rowCountStatistics == null) {
+      rowCountStatistics = getTextRowEstimate(options, getFilePattern());
+    }
+
+    return rowCountStatistics;
+  }
+
+  private static BeamRowCountStatistics getTextRowEstimate(
+      PipelineOptions options, String filePattern) {
+    TextRowCountEstimator textRowCountEstimator =
+        TextRowCountEstimator.builder().setFilePattern(filePattern).build();
+    try {
+      Long rows = textRowCountEstimator.estimateRowCount(options);
+      if (rows != null) {
 
 Review comment:
   is it ever `null`?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 268024)
    Time Spent: 2h  (was: 1h 50m)

> Row Count Estimation for CSV TextTable
> --------------------------------------
>
>                 Key: BEAM-7545
>                 URL: https://issues.apache.org/jira/browse/BEAM-7545
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-sql
>            Reporter: Alireza Samadianzakaria
>            Assignee: Alireza Samadianzakaria
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Implementing Row Count Estimation for CSV Tables by reading the first few 
> lines of the file and estimating the number of records based on the length of 
> these lines and the total length of the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to