[ 
https://issues.apache.org/jira/browse/SDAP-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589069#comment-16589069
 ] 

ASF GitHub Bot commented on SDAP-120:
-------------------------------------

lewismc commented on issue #32: SDAP-120 Error trying to ingest logs
URL: 
https://github.com/apache/incubator-sdap-mudrod/pull/32#issuecomment-415087443
 
 
   The regression happened during the move to HTTPS, which was April of 2017 
(wow that was a long time ago). You can see from the logs available at 
ftp://podaac.jpl.nasa.gov/misc/outgoing/cjf/mudrod/2017/04/, that the WWW.* 
artifacts contain the combined log format, whereas the WWWssl logs do not! 
Example are below
    
   Combined Log Format (Pre HTTPS/SSL Migration < April 2017)
   ```
   198.118.243.84 - - [29/Apr/2017:00:00:49 +0000] "GET 
/announcements/2016-08-29_RapidScat_Data_Loss_from_Power_Outage HTTP/1.1" 302 
274 "-" "gsa-crawler-earthdata (Enterprise; T5-ACQ4X8DLW7SKC; 
[email protected],[email protected])"
   ```
   
   Common Log Format (Post HTTPS/SSL Migration > April 2017)
   ```
   131.161.10.197 - - [29/Apr/2017:00:00:20 +0000] "GET 
/datasetlist?ids=Sensor:DataFormat:Collections:Measurement&values=SMAP_RADIOMETER:HDF5:Aquarius-CAP:Salinity%252525252525252FDensity
 HTTP/1.1" 200 86612
   ```
   You can clearly see the absence of the Referrer and User Agent in the Common 
Log Format.
    
   Addressing this regression is a priority and it does not have an adverse 
impact on the Webserver performance as this information is already cached in 
the WebServer we simply do not log it anymore. I'm working to get this fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Error trying to ingest logs
> ---------------------------
>
>                 Key: SDAP-120
>                 URL: https://issues.apache.org/jira/browse/SDAP-120
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Bug
>          Components: mudrod
>            Reporter: Frank Greguska
>            Priority: Blocker
>
> Trying to ingest January 2018 logs results in error
>  
> {quote}
> 2018-07-09 18:06:29,119 INFO  server.Server (Server.java:doStart(379)) - 
> Started @3794ms
> 2018-07-09 18:06:29,381 INFO  handler.ContextHandler 
> (ContextHandler.java:doStart(744)) - Started 
> o.s.j.s.ServletContextHandler@11dcd42c{/metrics/json,null,AVAILABLE}
> 2018-07-09 18:06:29,874 INFO  discoveryengine.WeblogDiscoveryEngine 
> (WeblogDiscoveryEngine.java:<init>(51)) - Started Mudrod Weblog Discovery 
> Engine.
> 2018-07-09 18:06:29,874 INFO  discoveryengine.WeblogDiscoveryEngine 
> (WeblogDiscoveryEngine.java:preprocess(98)) - Starting Web log preprocessing.
> 2018-07-09 18:06:29,875 INFO  discoveryengine.WeblogDiscoveryEngine 
> (WeblogDiscoveryEngine.java:preprocess(106)) - Processing logs dated 201801.gz
> 2018-07-09 18:06:30,013 INFO  pre.ImportLogFile 
> (ImportLogFile.java:execute(80)) - Starting Log Import 201801.gz
> 2018-07-09 18:06:31,084 INFO  util.Version (Version.java:logVersion(108)) - 
> Elasticsearch Hadoop v5.2.0 [d85a257f9f]
> 2018-07-09 18:06:31,451 INFO  rdd.EsRDDWriter 
> (RestService.java:createWriter(562)) - Writing to [log201801.gz/raw.http]
> 2018-07-09 18:08:15,371 INFO  rdd.EsRDDWriter 
> (RestService.java:createWriter(562)) - Writing to [log201801.gz/raw.ftp]
> 2018-07-09 18:13:15,916 INFO  pre.ImportLogFile 
> (ImportLogFile.java:execute(84)) - Log Import complete. Time elapsed 405 
> seconds
> 2018-07-09 18:13:15,925 INFO  pre.CrawlerDetection 
> (CrawlerDetection.java:execute(82)) - Starting Crawler detection raw.http
> 2018-07-09 18:13:16,262 ERROR main.MudrodEngine (MudrodEngine.java:main(395)) 
> - Error whilst parsing command line.
> java.lang.IllegalArgumentException: [size] must be greater than 0. Found [0] 
> in [Users]
>  at 
> org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder.size(TermsAggregationBuilder.java:148)
>  at 
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserTerms(LogAbstract.java:127)
>  at 
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserDocs(LogAbstract.java:135)
>  at 
> org.apache.sdap.mudrod.weblog.pre.LogAbstract.getUserRDD(LogAbstract.java:100)
>  at 
> org.apache.sdap.mudrod.weblog.pre.CrawlerDetection.checkByRateInParallel(CrawlerDetection.java:112)
>  at 
> org.apache.sdap.mudrod.weblog.pre.CrawlerDetection.execute(CrawlerDetection.java:85)
>  at 
> org.apache.sdap.mudrod.discoveryengine.WeblogDiscoveryEngine.preprocess(WeblogDiscoveryEngine.java:112)
>  at 
> org.apache.sdap.mudrod.main.MudrodEngine.startFullIngest(MudrodEngine.java:240)
>  at org.apache.sdap.mudrod.main.MudrodEngine.main(MudrodEngine.java:385)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to