[
https://issues.apache.org/jira/browse/SDAP-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538815#comment-16538815
]
Frank Greguska commented on SDAP-120:
-------------------------------------
It appears the regex here:
https://github.com/apache/incubator-sdap-mudrod/blob/faac228e88ebdb9fbd8e134f8550d5dc16738a71/core/src/main/java/org/apache/sdap/mudrod/weblog/pre/ImportLogFile.java#L53
does not actually match a log line so nothing is being ingested.
> Error trying to ingest logs
> ---------------------------
>
> Key: SDAP-120
> URL: https://issues.apache.org/jira/browse/SDAP-120
> Project: Apache Science Data Analytics Platform
> Issue Type: Bug
> Components: mudrod
> Reporter: Frank Greguska
> Priority: Blocker
>
> Trying to ingest January 2018 logs results in error
>
> {quote}
> 2018-07-09 18:06:29,119 INFO server.Server (Server.java:doStart(379)) -
> Started @3794ms
> 2018-07-09 18:06:29,381 INFO handler.ContextHandler
> (ContextHandler.java:doStart(744)) - Started
> o.s.j.s.ServletContextHandler@11dcd42c{/metrics/json,null,AVAILABLE}
> 2018-07-09 18:06:29,874 INFO discoveryengine.WeblogDiscoveryEngine
> (WeblogDiscoveryEngine.java:<init>(51)) - Started Mudrod Weblog Discovery
> Engine.
> 2018-07-09 18:06:29,874 INFO discoveryengine.WeblogDiscoveryEngine
> (WeblogDiscoveryEngine.java:preprocess(98)) - Starting Web log preprocessing.
> 2018-07-09 18:06:29,875 INFO discoveryengine.WeblogDiscoveryEngine
> (WeblogDiscoveryEngine.java:preprocess(106)) - Processing logs dated 201801.gz
> 2018-07-09 18:06:30,013 INFO pre.ImportLogFile
> (ImportLogFile.java:execute(80)) - Starting Log Import 201801.gz
> 2018-07-09 18:06:31,084 INFO util.Version (Version.java:logVersion(108)) -
> Elasticsearch Hadoop v5.2.0 [d85a257f9f]
> 2018-07-09 18:06:31,451 INFO rdd.EsRDDWriter
> (RestService.java:createWriter(562)) - Writing to [log201801.gz/raw.http]
> 2018-07-09 18:08:15,371 INFO rdd.EsRDDWriter
> (RestService.java:createWriter(562)) - Writing to [log201801.gz/raw.ftp]
> 2018-07-09 18:13:15,916 INFO pre.ImportLogFile
> (ImportLogFile.java:execute(84)) - Log Import complete. Time elapsed 405
> seconds
> 2018-07-09 18:13:15,925 INFO pre.CrawlerDetection
> (CrawlerDetection.java:execute(82)) - Starting Crawler detection raw.http
> 2018-07-09 18:13:16,262 ERROR main.MudrodEngine (MudrodEngine.java:main(395))
> - Error whilst parsing command line.
> java.lang.IllegalArgumentException: [size] must be greater than 0. Found [0]
> in [Users]
> at
> org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder.size(TermsAggregationBuilder.java:148)
> at
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserTerms(LogAbstract.java:127)
> at
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserDocs(LogAbstract.java:135)
> at
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserRDD(LogAbstract.java:100)
> at
> org.apache.sdap.mudrod.weblog.pre.CrawlerDetection.checkByRateInParallel(CrawlerDetection.java:112)
> at
> org.apache.sdap.mudrod.weblog.pre.CrawlerDetection.execute(CrawlerDetection.java:85)
> at
> org.apache.sdap.mudrod.discoveryengine.WeblogDiscoveryEngine.preprocess(WeblogDiscoveryEngine.java:112)
> at
> org.apache.sdap.mudrod.main.MudrodEngine.startFullIngest(MudrodEngine.java:240)
> at org.apache.sdap.mudrod.main.MudrodEngine.main(MudrodEngine.java:385)
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)