[ 
https://issues.apache.org/jira/browse/METRON-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Subramanian updated METRON-760:
-------------------------------------
    Description: 
*Steps to Reproduce*
1. Inject logs of the following kind into YAF kafka topic
ip_src (external IP) -> ip_dst (internal IP)
Here is a sample log:
{code}
2017-02-28 09:20:29.171|2017-02-28 09:20:55.684|   0.322|   0.228|  6|          
               62.75.195.236"|49184|                           192.168.1.1|   
80|       S|     APF|      AS|     APF|92a7a033|00b98442|000|000|       8|     
805|       8|     966|    0|
{code}

2. Wait for indices to be generated
3. Run the "Metron - YAF Telemetry" Zeppelin notebook

Following errors are seen in The *Top Talkers - External* and *Top Location* 
paragraphs

{code}
cannot resolve '`enrichments.geo.ip_dst_addr.country`' given input columns: 
[adapter.geoadapter.end.ts, isn, pkt, enrichmentsplitterbolt.splitter.end.ts, 
enrichments.geo.ip_src_addr.longitude, end_time, ip_dst_port, 
threatinteljoinbolt.joiner.ts, enrichments.geo.ip_src_addr.location_point, 
adapter.geoadapter.begin.ts, riflags, uflags, 
enrichmentsplitterbolt.splitter.begin.ts, risn, iflags, 
enrichments.geo.ip_src_addr.city, rtt, enrichments.geo.ip_src_addr.locID, 
enrichments.geo.ip_src_addr.postalCode, enrichments.geo.ip_src_addr.latitude, 
original_string, threatintelsplitterbolt.splitter.begin.ts, roct, 
threatintelsplitterbolt.splitter.end.ts, 
adapter.hostfromjsonlistadapter.end.ts, tag, 
enrichments.geo.ip_src_addr.country, app, ip_dst_addr, rtag, 
adapter.threatinteladapter.end.ts, ip_src_port, 
adapter.hostfromjsonlistadapter.begin.ts, ip_src_addr, 
enrichments.geo.ip_src_addr.dmaCode, enrichmentjoinbolt.joiner.ts, 
adapter.threatinteladapter.begin.ts, source.type, rpkt, duration, protocol, 
ruflags, start_time, oct, timestamp]; line 8 pos 8
{code}

The same behavior is also seen when messages of the scenario, _ip_src (internal 
IP) -> ip_dst (external IP)_ are injected into YAF.

Note that these errors are seen when YAF is ingested with _only_ unidirectional 
source messages (either external only source or external only destination)

*Possible Root Cause*
For the case with ip_src(external_ip) -> ip_dst(internal_ip), the 
enrichment.geo.* fields never get created for any of the ip_dst addresses, 
since it is an internal IP address. The select statement in the following spark 
sql query hence fails. Same is true for the reverse unidirectional scenario as 
well, where the geo enrichments are absent for the ip_src internal IPs.

{code}
%spark.sql

select ip, 
    sum(pkts) as pkts,
    sum(duration) as duration,
    country, 
    city
from (
    select ip_dst_addr as ip,
        `enrichments.geo.ip_dst_addr.country` as country,
        `enrichments.geo.ip_dst_addr.city` as city,
        pkt + rpkt as pkts,
        duration
    from yaf
    where (datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7)
    and is_internal(ip_dst_addr) = false
    union all
    select ip_src_addr as ip,
        `enrichments.geo.ip_src_addr.country` as country,
        `enrichments.geo.ip_src_addr.city` as city,
        pkt + rpkt as pkts,
        duration
    from yaf
    where datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7
    and is_internal(ip_src_addr) = false
) ips
group by ip, country, city
order by pkts desc
limit 10
{code}


*Workaround*
* Having a mix of event collection, i.e ip_src(internal IP) -> ip_dst(external 
IP) AND  ip_src(external IP) -> ip_dst(internal IP) will resolve the issue. 

  was:
*Steps to Reproduce*
1. Inject logs of the following kind into YAF kafka topic
ip_src (external IP) -> ip_dst (internal IP)
Here is a sample log:
{code}
2017-02-28 09:20:29.171|2017-02-28 09:20:55.684|   0.322|   0.228|  6|          
               62.75.195.236"|49184|                           192.168.1.1|   
80|       S|     APF|      AS|     APF|92a7a033|00b98442|000|000|       8|     
805|       8|     966|    0|
{code}

2. Wait for indices to be generated
3. Run the "Metron - YAF Telemetry" Zeppelin notebook

Following errors are seen in The *Top Talkers - External* and *Top Location* 
paragraphs

{code}
cannot resolve '`enrichments.geo.ip_dst_addr.country`' given input columns: 
[adapter.geoadapter.end.ts, isn, pkt, enrichmentsplitterbolt.splitter.end.ts, 
enrichments.geo.ip_src_addr.longitude, end_time, ip_dst_port, 
threatinteljoinbolt.joiner.ts, enrichments.geo.ip_src_addr.location_point, 
adapter.geoadapter.begin.ts, riflags, uflags, 
enrichmentsplitterbolt.splitter.begin.ts, risn, iflags, 
enrichments.geo.ip_src_addr.city, rtt, enrichments.geo.ip_src_addr.locID, 
enrichments.geo.ip_src_addr.postalCode, enrichments.geo.ip_src_addr.latitude, 
original_string, threatintelsplitterbolt.splitter.begin.ts, roct, 
threatintelsplitterbolt.splitter.end.ts, 
adapter.hostfromjsonlistadapter.end.ts, tag, 
enrichments.geo.ip_src_addr.country, app, ip_dst_addr, rtag, 
adapter.threatinteladapter.end.ts, ip_src_port, 
adapter.hostfromjsonlistadapter.begin.ts, ip_src_addr, 
enrichments.geo.ip_src_addr.dmaCode, enrichmentjoinbolt.joiner.ts, 
adapter.threatinteladapter.begin.ts, source.type, rpkt, duration, protocol, 
ruflags, start_time, oct, timestamp]; line 8 pos 8
{code}

The same behavior is also seen when messages of the scenario, _ip_src (internal 
IP) -> ip_dst (external IP)_ are injected into YAF.

Note that these errors are seen when YAF is ingested with _only_ unidirectional 
source messages (either external only source or external only destination)

*Possible Root Cause*
For the case with ip_src(external_ip) -> ip_dst(internal_ip), the 
enrichment.geo.* fields never get created for any of the ip_dst addresses. The 
select statement in the following spark sql query hence fails. Same is true for 
the reverse unidirectional scenario as well.

{code}
%spark.sql

select ip, 
    sum(pkts) as pkts,
    sum(duration) as duration,
    country, 
    city
from (
    select ip_dst_addr as ip,
        `enrichments.geo.ip_dst_addr.country` as country,
        `enrichments.geo.ip_dst_addr.city` as city,
        pkt + rpkt as pkts,
        duration
    from yaf
    where (datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7)
    and is_internal(ip_dst_addr) = false
    union all
    select ip_src_addr as ip,
        `enrichments.geo.ip_src_addr.country` as country,
        `enrichments.geo.ip_src_addr.city` as city,
        pkt + rpkt as pkts,
        duration
    from yaf
    where datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7
    and is_internal(ip_src_addr) = false
) ips
group by ip, country, city
order by pkts desc
limit 10
{code}


*Workaround*
* Having a mix of event collection, i.e ip_src(internal IP) -> ip_dst(external 
IP) AND  ip_src(external IP) -> ip_dst(internal IP) will resolve the issue. 


> YAF Zeppelin dashboard errors in paragraphs for unidirectional external 
> traffic
> -------------------------------------------------------------------------------
>
>                 Key: METRON-760
>                 URL: https://issues.apache.org/jira/browse/METRON-760
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Anand Subramanian
>
> *Steps to Reproduce*
> 1. Inject logs of the following kind into YAF kafka topic
> ip_src (external IP) -> ip_dst (internal IP)
> Here is a sample log:
> {code}
> 2017-02-28 09:20:29.171|2017-02-28 09:20:55.684|   0.322|   0.228|  6|        
>                  62.75.195.236"|49184|                           192.168.1.1| 
>   80|       S|     APF|      AS|     APF|92a7a033|00b98442|000|000|       8|  
>    805|       8|     966|    0|
> {code}
> 2. Wait for indices to be generated
> 3. Run the "Metron - YAF Telemetry" Zeppelin notebook
> Following errors are seen in The *Top Talkers - External* and *Top Location* 
> paragraphs
> {code}
> cannot resolve '`enrichments.geo.ip_dst_addr.country`' given input columns: 
> [adapter.geoadapter.end.ts, isn, pkt, enrichmentsplitterbolt.splitter.end.ts, 
> enrichments.geo.ip_src_addr.longitude, end_time, ip_dst_port, 
> threatinteljoinbolt.joiner.ts, enrichments.geo.ip_src_addr.location_point, 
> adapter.geoadapter.begin.ts, riflags, uflags, 
> enrichmentsplitterbolt.splitter.begin.ts, risn, iflags, 
> enrichments.geo.ip_src_addr.city, rtt, enrichments.geo.ip_src_addr.locID, 
> enrichments.geo.ip_src_addr.postalCode, enrichments.geo.ip_src_addr.latitude, 
> original_string, threatintelsplitterbolt.splitter.begin.ts, roct, 
> threatintelsplitterbolt.splitter.end.ts, 
> adapter.hostfromjsonlistadapter.end.ts, tag, 
> enrichments.geo.ip_src_addr.country, app, ip_dst_addr, rtag, 
> adapter.threatinteladapter.end.ts, ip_src_port, 
> adapter.hostfromjsonlistadapter.begin.ts, ip_src_addr, 
> enrichments.geo.ip_src_addr.dmaCode, enrichmentjoinbolt.joiner.ts, 
> adapter.threatinteladapter.begin.ts, source.type, rpkt, duration, protocol, 
> ruflags, start_time, oct, timestamp]; line 8 pos 8
> {code}
> The same behavior is also seen when messages of the scenario, _ip_src 
> (internal IP) -> ip_dst (external IP)_ are injected into YAF.
> Note that these errors are seen when YAF is ingested with _only_ 
> unidirectional source messages (either external only source or external only 
> destination)
> *Possible Root Cause*
> For the case with ip_src(external_ip) -> ip_dst(internal_ip), the 
> enrichment.geo.* fields never get created for any of the ip_dst addresses, 
> since it is an internal IP address. The select statement in the following 
> spark sql query hence fails. Same is true for the reverse unidirectional 
> scenario as well, where the geo enrichments are absent for the ip_src 
> internal IPs.
> {code}
> %spark.sql
> select ip, 
>     sum(pkts) as pkts,
>     sum(duration) as duration,
>     country, 
>     city
> from (
>     select ip_dst_addr as ip,
>         `enrichments.geo.ip_dst_addr.country` as country,
>         `enrichments.geo.ip_dst_addr.city` as city,
>         pkt + rpkt as pkts,
>         duration
>     from yaf
>     where (datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7)
>     and is_internal(ip_dst_addr) = false
>     union all
>     select ip_src_addr as ip,
>         `enrichments.geo.ip_src_addr.country` as country,
>         `enrichments.geo.ip_src_addr.city` as city,
>         pkt + rpkt as pkts,
>         duration
>     from yaf
>     where datediff(current_timestamp(), from_unixtime(timestamp/1000)) <= 7
>     and is_internal(ip_src_addr) = false
> ) ips
> group by ip, country, city
> order by pkts desc
> limit 10
> {code}
> *Workaround*
> * Having a mix of event collection, i.e ip_src(internal IP) -> 
> ip_dst(external IP) AND  ip_src(external IP) -> ip_dst(internal IP) will 
> resolve the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to