*Job Title– SRE Architect*

*Duration: Fulltime/ Long Term Contract*

*Location: Fremont, CA / Charlotte, NC (Onsite)*

*EMAIL RESUMES TO diceres...@altius.us.com*


*Who are we looking for?*

As a Site Reliability Engineer (SRE), you'll help build & maintain valuable 
engineering discipline, combining software and systems to develop 
engineering solutions to operations problems. Our support and software 
development focuses on improving and optimizing existing systems, building 
infrastructure and reducing work through automation. You'll join a team of 
curious mind to solve business problems. In this environment, you'll take 
the lead on relevant task as independent contributor that can resolve the 
day to day business challenge. As an SRE, you'll be focused on running 
better production applications and systems and improve as on-going 
objective.

 

*Your responsibilities:*

   - Responsible for Service Level Agreements (SLA), Service Level 
   Objectives (SLO), and associated metrics associated with the Critical Java 
   applications deployed in Cloud 
   - Provide Cloud operations management, Cloud services deployments 
   - Ensure Cloud Services availability 
   - Strong knowledge of automation and scripting language (Python) 
   - Work with SRE team to maintain integrity of cloud services deployments 
   - Responsible of CI/CD pipelines across platform and applications 
   - Proactive monitoring, analysis, remediation, and action as needed 
   - Responsible for managing incidents, problems, change management, 
   release management, analytics on previous incidents and usage patterns 
   - Manage new development, new enhancement and operationalize the changes 
   - Manage the On Call staffing plan, roster, allocation of team members, 
   internal and external communication and reporting 
   - Plan and implement patching and upgrades 
   - Analyze system health metrics 
   - Enforce best practices security, reliability, resiliency, 
   self-healing, HA, automation and quality of service 
   - Establish and follows SRE Principles 
   - Coordinate and manage the operational schedules and priorities 
   - Infrastructure Monitoring and Reports for all performance metrics 

 

*Technical Skills:*

   - 15+ years overall experience with 5+ years in SRE Technical Manager 
   role handling IaaS, PaaS and Microservices on PCF / Azure 
   - 4+ experience as SRE Engineer in DevOps, DataOps, SecOps or InfraOps 
   - 2+ experience as Level 1, 1.5 or 2 support / operations with 24x7 
   support across onsite/offshore/nearshore model 
   - Experience managing a large global cloud organization working in 
   multiple locations and time zones. 
   - Brings the best of the industry and the organization along in the 
   journey 
   - Good knowledge of Information Technology Infrastructure Library 
   processes 
   - Experience managing SLI, SLO, Toil management, Error budget and metrics 
   - Experience in cloud reliability standards, observability, security, 
   performance, disaster recovery and reporting requirements 
   - Experience with identifying Manual, repetitive, automatable task and 
   automate them 
   - Experience with IT and Cloud security standards and compliance 
   - Hand on Experience working on Java, PCF or Azure Platforms 
   - Hand on Experience in working Azure AD 
   - Hand on Experience in automation and scripting using Python 
   - Strong expertise in Cloud concepts like Infrastructure as Code, Cloud 
   Computing, Cloud Networking, Cloud Storage & Backup, Containerization, SSO, 
   sFTP, and SRE 
   - Experience in understanding and implementing SecOps needs 
   - Experience in release, deployment of patches across the spectrum of 
   scope 

 

*Process Skills:*

·              Having sound knowledge of ITIL practices like Change 
Management, Incident Management, Problem management, release management etc.

·              Exceptional communication skills

·              Self-starter, ambitious, willing to take on difficult 
problems

·              Collaborative, team player attitude

·              Practical exposure & knowledge in existing / emerging cloud 
Database technologies.

·              Has worked in Metrix role with an ability to work 
independently with multiple managers with dotted line hierarchies.

·              Keeping abreast of industry trends, technology innovation, 
and changing customer requirements to help with the continual service 
improvement process.

·              Participate in on-call rotations and be responsible for 
infrastructure and platform level escalations.

·              Work with the DevOps team on planning and implementation of 
infrastructure capacity planning, upgrades, and monitoring.

·              Participate in Daily (Standup) Production Reviews

·              Contribute to the design and improvement of deployment 
architecture of new and existing applications based on the principles of 
reliability, high availability, efficiency, and observability.

·              Research, learn, adapt, customize, and create tools to 
improve the observability, resilience, and usability of applications in 
scope

·              Create and maintain SRE-related documentation (solution 
repository, Root Cause Analysis Reports etc.)

 

*Certification:*

   - Certification in PCF, Java. 


-- 
You received this message because you are subscribed to the Google Groups 
"Android Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to android-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/android-developers/023ffbd8-6151-44f1-b656-1d2141ba3fa0n%40googlegroups.com.

Reply via email to