s/dagbag import exception/dag import timeout exception/ On Wed, Jul 31, 2019 at 11:17 PM Kevin Yang <yrql...@gmail.com> wrote:
> Hi Jonathan, for your problem, aside waiting for AIP-24 for the long term, > you can try set the dagbag_import_timeout > <https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L162> > to a smaller value so that those slow DAG file parsing ends faster. Also I > don't think one DAG parsing can block parsing of other DAG files even we > parse all of them in a single thread in the webserver. All exceptions are > captured, including the dagbag import exception, will be captured and > logged > <https://github.com/apache/airflow/blob/master/airflow/models/dagbag.py#L197-L203> > . > > Love Dan's ideas and agree with Fokko to start small and expand. > > I scan through PR5701 <https://github.com/apache/airflow/pull/5701> and > it is exactly what I care the most for this AIP--to me others come > naturally after we have a DAG serialization pattern defined. Good job Zhou > and pardon me for not having enough bandwidth to review it thoroughly. +1 > for limit the scope of this AIP to item 1 and 3 in your proposed timeline. > > > Cheers, > Kevin Y > > On Wed, Jul 31, 2019 at 5:11 PM Zhou Fang <zhouf...@google.com.invalid> > wrote: > >> I implemented the first version of DAG serialization part in AIP-24: >> https://github.com/apache/airflow/pull/5701. Please take a look if you >> are >> interested @all. Thanks! >> >> It contains almost all fields of DAGs and tasks in the serialization (an >> example of serialized DAG here: >> >> https://github.com/apache/airflow/blob/35e38f19b09646a0f85a2a7866a8d9aacc345252/tests/dags/test_dag_serialization.py#L100 >> ). >> So basically the webserver can still treat them as before. No webserver UI >> code change is needed. The benefit is that we can use it for 1.10.*. >> >> Of course, it is a short-term fix compared to many long-term proposals. >> >> It only contains serialization. I verified its usage in UI end-to-end by >> using the Async DAG Loader in https://github.com/apache/airflow/pull/5594 >> . >> I split the DAG serialization out of 5594 since Async DAG Loader is an >> optional one. (I suddenly recall that if there are N webserver process + 1 >> async DAG loading process, it may solve webserver inconsistency problem??) >> >> >> On Wed, Jul 31, 2019 at 10:33 AM Tao Feng <fengta...@gmail.com> wrote: >> >> > hey Zhou, >> > >> > Great to see this happens and make it backward compatible. I will >> persist >> > DAG into DB is definitely needed. And it will make migration easier >> with a >> > lightweight approach. At Lyft we sometimes observe nondeterministic >> > increased scheduling delay once users add some dynamic generated large >> DAGs >> > with thousands of tasks. >> > >> > I will spend some time to look at your proposal more in more detail. >> But I >> > agree that this is the most important pain point that we should address. >> > And let me know if anything I could help to facilitate this. >> > >> > >> > On Mon, Jul 29, 2019 at 2:13 PM Zhou Fang <zhouf...@google.com.invalid> >> > wrote: >> > >> > > Thanks everyone for the discussion. The comments are very helpful. >> > > >> > > AIP-24 that we proposed here is really a short-term one to minimize >> the >> > > change for fast launch and compatibility. I agree with the benefits of >> > the >> > > long-term proposals. It would be great if AIP-24 can be a first step >> (if >> > we >> > > can agree with the basic serialization approach). Then we can >> gradually >> > > apply long-term fixes. >> > > >> > > I summarized a few long-term proposals (from Fokko and Ash) and added >> a >> > > 'timeline' in AIP-24 (make things more clear): >> > > >> > > *Terms* >> > > >> > > - (this) stringified DAG: a patch to current DAG that can be >> JSONified >> > > - (long-term) serialized DAG: a new serializable DAG class used by >> > > webserver/scheduler >> > > >> > > *Proposed timeline* >> > > >> > > 1. (this) JSON Serialization of DAGs >> > > 1. will be out with https://github.com/apache/airflow/pull/5594 >> > > >> > > 2. (this, optional) Asynchronous DAG loading in webserver >> > > 1. webserver process uses a background process to collect DAGs, >> > > solve scalability issue before DAG persistence in DB being out >> > > 2. webserver process itself does not need to restart every 30s >> to >> > > collect DAGs >> > > 3. will be out with https://github.com/apache/airflow/pull/5594 >> > > >> > > 3. (this) DAG persistence in DB for webserver >> > > 1. minimal Airflow code change >> > > 2. an optional feature enabled via configuration >> > > 3. rolled out with Airflow 1.10.5 >> > > >> > > 4. (this, optional) Using DAG cached in DB for scheduling >> > > >> > > 5. (long-term) Defining serialized DAG for webserver >> > > 1. this proposal keeps all fields of DAG/Operator, however, some >> > > fields are not used by webserver or scheduler >> > > 2. trimming these fields are easy, just providing a list of >> fields >> > to >> > > include or exclude (Sec 2.3): _serialize_object(x, visited_dags) >> > > =>_serialize_object(x, visited_dags, include=['foo'], >> > > exclude=['bar']) >> > > 3. we should carefully check all webserver/scheduler code to >> make >> > > sure trimmed fields are not used, e.g., *task.owner* is used in >> > > webserver >> > > >> > > 6. (long-term) Defining serialized DAG for scheduler >> > > 1. Once we have 'stringified DAG' or 'serialized DAG', >> > > SimpleDAG/SimpleTaskInstance used by scheduler are not needed >> > > 2. adding more fields to stringified DAGs to be compatible with >> > > scheduler >> > > >> > > 7. (long-term) Directly reading DAGs from DB in webserver >> > > 1. let webserver process fetch data from DB, instead of making a >> > DAG >> > > bag and refresh it >> > > 2. it solves the webserver inconsistency issue >> > > >> > > 8. (long-term) Event-driven DAG parsing >> > > 1. Instead of polling DAG files for updating/deleting DAGs, >> event >> > > based approaches, *e.g.*, inotify ( >> > > https://pypi.org/project/inotify_simple/) can be used >> > > >> > > >> > > >> > > >> > > >> > > On Mon, Jul 29, 2019 at 3:23 AM Kaxil Naik <kaxiln...@gmail.com> >> wrote: >> > > >> > > > Thanks all for the input and thanks Zhou too for the detailed AIP. >> > > > >> > > > The WIP PR can be a good first step to overall optimization. >> > > > >> > > > Let's sync-up on the progress you have already made & what we want >> to >> > > > target. >> > > > >> > > > @Jarek Potiuk <jarek.pot...@polidea.com> & @Fokko - If we manage >> to >> > > make >> > > > it entirely backward-compatible with an enable/disable flag as we >> > > > mentioned, we can think of including it in 1.10.5 but I am in favor >> of >> > > > removing / cleaning stuff like pickles, drop Py 2.0 and cut Airflow >> 2.0 >> > > and >> > > > include this change there. >> > > > >> > > > >> > > > >> > > > >> > > > On Mon, Jul 29, 2019 at 1:03 PM Jarek Potiuk < >> jarek.pot...@polidea.com >> > > >> > > > wrote: >> > > > >> > > > > Actually I am also doing a lot of v1-10-test merges during the >> last >> > few >> > > > > months (probably several tens of them already). Rarely the >> conflicts >> > > are >> > > > > difficult to solve in fact. We have usually small, localised >> changes >> > > and >> > > > > until we go for full Black file re-formatting, we should be ok >> (and >> > the >> > > > > change from Zhou seems rather small and localised). >> > > > > >> > > > > J. >> > > > > >> > > > > On Mon, Jul 29, 2019 at 9:25 AM Driesprong, Fokko >> > <fo...@driesprong.frl >> > > > >> > > > > wrote: >> > > > > >> > > > > > I would be hesitant to merge it into 1.10.5. When I try to >> backport >> > > > > > anything into the 1.x branch, I get a whole bunch on merge >> > conflicts, >> > > > > even >> > > > > > on the trivial tickets. For me, the only one who can really >> comment >> > > on >> > > > > this >> > > > > > would be Ash, since he's doing the bulk of the conflict >> resolving. >> > > > Apart >> > > > > > from that, I'm really excited to make this happen! >> > > > > > >> > > > > > Cheers, Fokko >> > > > > > >> > > > > > >> > > > > > >> > > > > > Op zo 28 jul. 2019 om 20:23 schreef Jarek Potiuk < >> > > > > jarek.pot...@polidea.com >> > > > > > >: >> > > > > > >> > > > > > > Some thought I have after looking at the proposal from Zhou. >> > > > > > > >> > > > > > > I think this is one of the most important things feature-wise >> for >> > > > > > Airflow. >> > > > > > > It looks like we have several in-progress attempts to solve >> the >> > > > problem >> > > > > > and >> > > > > > > I guess we should agree common approach. >> > > > > > > >> > > > > > > I like very much the approach of Zhou (AIP-24). It does seem >> to >> > > > > minimise >> > > > > > > the changes needed in Airflow and it means that we with some >> > > > > > optimisations >> > > > > > > (caching mentioned by Fokko) - it can solve the major pain >> points >> > > > and I >> > > > > > > think relatively quick and is potentially portable to 1.10.5 >> if >> > we >> > > > have >> > > > > > it. >> > > > > > > >> > > > > > > I wonder how much it overlaps/differs from what Kaxil and Ash >> > ideas >> > > > > are. >> > > > > > If >> > > > > > > I read it correctly - it sounds like this idea will contain >> some >> > > more >> > > > > > > "fundamental" changes. Ones that are likely less >> > > > backwards-compatible, >> > > > > > and >> > > > > > > potentially taking longer time to implement and test. And >> likely >> > > > > solving >> > > > > > > some of the problems better or even solving other problems. >> Am I >> > > > right >> > > > > > with >> > > > > > > my assumptions? >> > > > > > > >> > > > > > > I think more information on this might be helpful so that we >> all >> > > know >> > > > > if >> > > > > > > those are two different AIPs, or whether they can be joined in >> > one >> > > > > > effort, >> > > > > > > and how they relate to AIP-18/AIP-19 (should those be >> deprecated >> > or >> > > > > > > independently implemented ?). Also - since 2.0.0 release is >> half >> > a >> > > > year >> > > > > > > ahead we should consider how it impact the roadmap. >> > > > > > > >> > > > > > > I can see three approaches here that we as community can >> follow >> > > > (maybe >> > > > > I >> > > > > > am >> > > > > > > missing some :) ): >> > > > > > > >> > > > > > > 1) focus our work on single "complete" solution that will take >> > > longer >> > > > > > time >> > > > > > > and targets 2.0.0. >> > > > > > > 2) work on two of them: one quick/fast - potentially portable >> to >> > > > > 1.10.5m >> > > > > > > one longer-term for 2.0.0. >> > > > > > > 3) decide that the simple solution we have from Zhou (maybe >> with >> > > some >> > > > > > > modifications) is our target solution (for both 1.10.5 if we >> have >> > > it >> > > > > and >> > > > > > > 2.0.0): >> > > > > > > >> > > > > > > J. >> > > > > > > >> > > > > > > On Sat, Jul 27, 2019 at 11:43 AM Kevin Yang < >> yrql...@gmail.com> >> > > > wrote: >> > > > > > > >> > > > > > > > Nice job Zhou! >> > > > > > > > >> > > > > > > > Really excited, exactly what we wanted for the webserver >> > scaling >> > > > > issue. >> > > > > > > > Want to add another big drive for Airbnb to start think >> about >> > > this >> > > > > > > > previously to support the effort: it can not only bring >> > > consistency >> > > > > > > between >> > > > > > > > webservers but also bring consistency between webserver and >> > > > > > > > scheduler/workers. It may be less of a problem if total DAG >> > > parsing >> > > > > > time >> > > > > > > is >> > > > > > > > small, but for us the total DAG parsing time is 15+ mins >> and we >> > > had >> > > > > to >> > > > > > > set >> > > > > > > > the webserver( gunicorn subprocesses) restart interval to 20 >> > > mins, >> > > > > > which >> > > > > > > > leads to a worst case 15+20+15=50 mins delay between >> scheduler >> > > > start >> > > > > to >> > > > > > > > schedule things and users can see their deployed >> > DAGs/changes... >> > > > > > > > >> > > > > > > > I'm not so sure about the scheduler performance improvement: >> > > > > currently >> > > > > > we >> > > > > > > > already feed the main scheduler process with SimpleDag >> through >> > > > > > > > DagFileProcessorManager running in a subprocess--in the >> future >> > we >> > > > > feed >> > > > > > it >> > > > > > > > with data from DB, which is likely slower( tho the diff >> should >> > > have >> > > > > > > > negligible impact to the scheduler performance). In fact if >> > we'd >> > > > keep >> > > > > > the >> > > > > > > > existing behavior, try schedule only fresh parsed DAGs, >> then we >> > > may >> > > > > > need >> > > > > > > to >> > > > > > > > deal with some consistency issue--dag processor and the >> > scheduler >> > > > > race >> > > > > > > for >> > > > > > > > updating the flag indicating if the DAG is newly parsed. No >> big >> > > > deal >> > > > > > > there >> > > > > > > > but just some thoughts on the top of my head and hopefully >> can >> > be >> > > > > > > helpful. >> > > > > > > > >> > > > > > > > And good idea on pre-rendering the template, believe >> template >> > > > > rendering >> > > > > > > was >> > > > > > > > the biggest concern in the previous discussion. We've also >> > chose >> > > > the >> > > > > > > > pre-rendering+JSON approach in our smart sensor API >> > > > > > > > < >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization >> > > > > > > > > >> > > > > > > > and >> > > > > > > > seems to be working fine--a supporting case for ur proposal >> ;) >> > > > > There's >> > > > > > a >> > > > > > > > WIP >> > > > > > > > PR <https://github.com/apache/airflow/pull/5499> for it >> just >> > in >> > > > case >> > > > > > you >> > > > > > > > are interested--maybe we can even share some logics. >> > > > > > > > >> > > > > > > > Thumbs-up again for this and please don't heisitate to reach >> > out >> > > if >> > > > > you >> > > > > > > > want to discuss further with us or need any help from us. >> > > > > > > > >> > > > > > > > >> > > > > > > > Cheers, >> > > > > > > > Kevin Y >> > > > > > > > >> > > > > > > > On Sat, Jul 27, 2019 at 12:54 AM Driesprong, Fokko >> > > > > > <fo...@driesprong.frl >> > > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Looks great Zhou, >> > > > > > > > > >> > > > > > > > > I have one thing that pops in my mind while reading the >> AIP; >> > > > should >> > > > > > > keep >> > > > > > > > > the caching on the webserver level. As the famous quote >> goes: >> > > > > *"There >> > > > > > > are >> > > > > > > > > only two hard things in Computer Science: cache >> invalidation >> > > and >> > > > > > naming >> > > > > > > > > things." -- Phil Karlton* >> > > > > > > > > >> > > > > > > > > Right now, the fundamental change that is being proposed >> in >> > the >> > > > AIP >> > > > > > is >> > > > > > > > > fetching the DAGs from the database in a serialized >> format, >> > and >> > > > not >> > > > > > > > parsing >> > > > > > > > > the Python files all the time. This will give already a >> great >> > > > > > > performance >> > > > > > > > > improvement on the webserver side because it removes a >> lot of >> > > the >> > > > > > > > > processing. However, since we're still fetching the DAGs >> from >> > > the >> > > > > > > > database >> > > > > > > > > in a regular interval, cache it in the local process, so >> we >> > > still >> > > > > > have >> > > > > > > > the >> > > > > > > > > two issues that Airflow is suffering from right now: >> > > > > > > > > >> > > > > > > > > 1. No snappy UI because it is still polling the >> database >> > in >> > > a >> > > > > > > regular >> > > > > > > > > interval. >> > > > > > > > > 2. Inconsistency between webservers because they might >> > poll >> > > > in a >> > > > > > > > > different interval, I think we've all seen this: >> > > > > > > > > https://www.youtube.com/watch?v=sNrBruPS3r4 >> > > > > > > > > >> > > > > > > > > As I also mentioned in the Slack channel, I strongly feel >> > that >> > > we >> > > > > > > should >> > > > > > > > be >> > > > > > > > > able to render most views from the tables in the >> database, so >> > > > > without >> > > > > > > > > touching the blob. For specific views, we could just pull >> the >> > > > blob >> > > > > > from >> > > > > > > > the >> > > > > > > > > database. In this case we always have the latest version, >> and >> > > we >> > > > > > tackle >> > > > > > > > the >> > > > > > > > > second point above. >> > > > > > > > > >> > > > > > > > > To tackle the first one, I also have an idea. We should >> > change >> > > > the >> > > > > > DAG >> > > > > > > > > parser from a loop to something that uses inotify >> > > > > > > > > https://pypi.org/project/inotify_simple/. This will >> change >> > it >> > > > from >> > > > > > > > polling >> > > > > > > > > to an event-driven design, which is much more performant >> and >> > > less >> > > > > > > > resource >> > > > > > > > > hungry. But this would be an AIP on its own. >> > > > > > > > > >> > > > > > > > > Again, great design and a comprehensive AIP, but I would >> > > include >> > > > > the >> > > > > > > > > caching on the webserver to greatly improve the user >> > experience >> > > > in >> > > > > > the >> > > > > > > > UI. >> > > > > > > > > Looking forward to the opinion of others on this. >> > > > > > > > > >> > > > > > > > > Cheers, Fokko >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Op za 27 jul. 2019 om 01:44 schreef Zhou Fang >> > > > > > > > <zhouf...@google.com.invalid >> > > > > > > > > >: >> > > > > > > > > >> > > > > > > > > > Hi Kaxi, >> > > > > > > > > > >> > > > > > > > > > Just sent out the AIP: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler >> > > > > > > > > > >> > > > > > > > > > Thanks! >> > > > > > > > > > Zhou >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Fri, Jul 26, 2019 at 1:33 PM Zhou Fang < >> > > zhouf...@google.com >> > > > > >> > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi Kaxil, >> > > > > > > > > > > >> > > > > > > > > > > We are also working on persisting DAGs into DB using >> JSON >> > > for >> > > > > > > Airflow >> > > > > > > > > > > webserver in Google Composer. We target at minimizing >> the >> > > > > change >> > > > > > to >> > > > > > > > the >> > > > > > > > > > > current Airflow code. Happy to get synced on this! >> > > > > > > > > > > >> > > > > > > > > > > Here is our progress: >> > > > > > > > > > > (1) Serializing DAGs using Pickle to be used in >> webserver >> > > > > > > > > > > It has been launched in Composer. I am working on the >> PR >> > to >> > > > > > > upstream >> > > > > > > > > it: >> > > > > > > > > > > https://github.com/apache/airflow/pull/5594 >> > > > > > > > > > > Currently it does not support non-Airflow operators >> and >> > we >> > > > are >> > > > > > > > working >> > > > > > > > > on >> > > > > > > > > > > a fix. >> > > > > > > > > > > >> > > > > > > > > > > (2) Caching Pickled DAGs in DB to be used by webserver >> > > > > > > > > > > We have a proof-of-concept implementation, working on >> an >> > > AIP >> > > > > now. >> > > > > > > > > > > >> > > > > > > > > > > (3) Using JSON instead of Pickle in (1) and (2) >> > > > > > > > > > > Decided to use JSON because Pickle is not secure and >> > human >> > > > > > > readable. >> > > > > > > > > The >> > > > > > > > > > > serialization approach is very similar to (1). >> > > > > > > > > > > >> > > > > > > > > > > I will update the RP ( >> > > > > > https://github.com/apache/airflow/pull/5594) >> > > > > > > > to >> > > > > > > > > > > replace Pickle by JSON, and send our design of (2) as >> an >> > > AIP >> > > > > next >> > > > > > > > week. >> > > > > > > > > > > Glad to check together whether our implementation >> makes >> > > sense >> > > > > and >> > > > > > > do >> > > > > > > > > > > improvements on that. >> > > > > > > > > > > >> > > > > > > > > > > Thanks! >> > > > > > > > > > > Zhou >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > On Fri, Jul 26, 2019 at 7:37 AM Kaxil Naik < >> > > > > kaxiln...@gmail.com> >> > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > >> Hi all, >> > > > > > > > > > >> >> > > > > > > > > > >> We, at Astronomer, are going to spend time working on >> > DAG >> > > > > > > > > Serialisation. >> > > > > > > > > > >> There are 2 AIPs that are somewhat related to what we >> > plan >> > > > to >> > > > > > work >> > > > > > > > on: >> > > > > > > > > > >> >> > > > > > > > > > >> - AIP-18 Persist all information from DAG file in >> DB >> > > > > > > > > > >> < >> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-18+Persist+all+information+from+DAG+file+in+DB >> > > > > > > > > > >> > >> > > > > > > > > > >> - AIP-19 Making the webserver stateless >> > > > > > > > > > >> < >> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless >> > > > > > > > > > >> > >> > > > > > > > > > >> >> > > > > > > > > > >> We plan to use JSON as the Serialisation format and >> > store >> > > it >> > > > > as >> > > > > > a >> > > > > > > > blob >> > > > > > > > > > in >> > > > > > > > > > >> metadata DB. >> > > > > > > > > > >> >> > > > > > > > > > >> *Goals:* >> > > > > > > > > > >> >> > > > > > > > > > >> - Make Webserver Stateless >> > > > > > > > > > >> - Use the same version of the DAG across >> Webserver & >> > > > > > Scheduler >> > > > > > > > > > >> - Keep backward compatibility and have a flag >> > > (globally & >> > > > > at >> > > > > > > DAG >> > > > > > > > > > level) >> > > > > > > > > > >> to turn this feature on/off >> > > > > > > > > > >> - Enable DAG Versioning (extended Goal) >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> We will be preparing a proposal (AIP) after some >> > research >> > > > and >> > > > > > some >> > > > > > > > > > initial >> > > > > > > > > > >> work and open it for the suggestions of the >> community. >> > > > > > > > > > >> >> > > > > > > > > > >> We already had some good brain-storming sessions with >> > > > Twitter >> > > > > > > folks >> > > > > > > > > > (DanD >> > > > > > > > > > >> & >> > > > > > > > > > >> Sumit), folks from GoDataDriven (Fokko & Bas) & Alex >> > (from >> > > > > Uber) >> > > > > > > > which >> > > > > > > > > > >> will >> > > > > > > > > > >> be a good starting point for us. >> > > > > > > > > > >> >> > > > > > > > > > >> If anyone in the community is interested in it or has >> > some >> > > > > > > > experience >> > > > > > > > > > >> about >> > > > > > > > > > >> the same and want to collaborate please let me know >> and >> > > join >> > > > > > > > > > >> #dag-serialisation channel on Airflow Slack. >> > > > > > > > > > >> >> > > > > > > > > > >> Regards, >> > > > > > > > > > >> Kaxil >> > > > > > > > > > >> >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > >> > > > > > > Jarek Potiuk >> > > > > > > Polidea <https://www.polidea.com/> | Principal Software >> Engineer >> > > > > > > >> > > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > >> > > > > Jarek Potiuk >> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > > > > >> > > > > M: +48 660 796 129 <+48660796129> >> > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > >> > > > >> > > > >> > > > -- >> > > > *Kaxil Naik* >> > > > *Big Data Consultant | DevOps Data Engineer* >> > > > *Certified *Google Cloud Data Engineer | *Certified* Apache Spark & >> > Neo4j >> > > > Developer >> > > > *LinkedIn*: https://www.linkedin.com/in/kaxil >> > > > >> > > >> > >> >